In this tutorial, we will be walking you through building your first Dataflow using a Read Connector, a Transform, and a Data Feed. We'll be using publicly available NYC weather data as our sample data. It contains monthly data from 2012 to 2017, for a weather station in central park. It contains for each day the weather (e.g. cloudy, rainy, sunny, pouring) and the level of precipitation as our sample data. So, let’s get started…
In order to follow along with this tutorial, you should either have your own Ascend environment (paid account) or a free trial version. Go here to request a trial of the Ascend platform.
Dataflows are continuously running data pipelines. You design a Dataflow with a combination of declarative configurations, SQL, and Python. As you design Dataflows, you are creating a blueprint that is sent to the Ascend Dataflow Control Plane to be automated. For this tutorial, you will create a new Dataflow within your Data Service.
- Create a new Dataflow called Weather Dataflow by clicking the + NEW DATAFLOW button (shown below) next to your Data service and provide an optional brief description. Click CREATE and you've instantly created your Dataflow.
Once inside the Dataflow canvas, we can explore some of the essential things (e.g. menus and options) you'll need to be familiar with before we move on.
- The dashboard icon will take you back to the system dashboard.
- The Data Service menu will show you a list of all data services that you can quickly jump to along with being able to manage various settings and permissions.
- The Dataflow menu allows settings and permissions configuration for this Dataflow or others within this data service.
- The Add menu is what you’ll be using most of the time and it’s from there where you can create new read & write connectors, transforms, data feeds and the grouping of components for better Dataflow visibility.
- The Import and Export allow saving and exporting versions of your Dataflow so it can be copied and redeployed in a different data service or for versioning.
- The other tabs show Dataflow profiling information and the ability to perform a quick on-demand query, much as you would an ad-hoc query into your data warehouse.
- On the far right-hand of the screen, you’ll see a link to our on-line docs complete with a public discussion board and a drop-down menu underneath your login name for other settings.
The first step in our Dataflow is to create a Read Connectors that accesses the weather data. Read Connectors allow Ascend to access any data from any source with any format.
- On the upper left side of the screen click + ADD and then click Connector to bring up the Connectors panel. From the Connectors Panel you'll get a list of all possible pre-built connectors to choose from and you can also easily build my own custom connector if you so desire.
The Connectors panel displays a variety of Read Connectors supported by Ascend.
- For this tutorial, we will be connecting to an AWS S3 source so select the S3 Read Connector by clicking on the READ icon for S3. This will bring up the CREATE NEW READ CONNECTOR screen (shown below).
- Create the name of the connector as Weather and optionally provide a description.
- For Location enter the following:
- Enter ascend-io-sample-data-read as the bucket name.
- Select Match and enter NYC-TAXI/weather/weather.csv as the pattern.
- Uncheck the Requires Credentials checkbox.
- Use TEST CONNECTION to check whether all S3 bucket settings and permissions are properly and correctly configured.
- For Parser Settings enter the following:
- Select XSV as the FORMAT TO USE
- Select Comma as the FIELD DELIMITER
- For HEADER ROW check the Files have a header row checkbox.
You will see a preview of your XSV to confirm everything looks accurate (as shown below)
- Finish and Preview
Leave the remaining fields at their default values and Click the CREATE button to create the connector and have Ascend ingest and parse it into your Dataflow.
When it's complete you should see the Read Connector on your Dataflow canvas (as shown below).
As soon as you have one or more read connectors (or data feed) you can start creating Ascend Transforms to do things like cleaning, filtering, joining and aggregating your data using SQL. Additionally, you can specify PySpark functions for specific compute and machine learning model training purposes.
We need to do some cleanup and prep on our Weather dataset that we just ingested. Specifically, some column heading names, TIMESTAMP formatting and filter out all records that have no precipitation. Let's do that using some SQL code in a Transform.
- Click + ADD menu and select Transform from there. This will bring up the CREATE NEW TRANSFORM screen (shown below). Alternatively, you can create the new Transform by right-clicking the Weather Read Connector and selecting Create new transform from the drop-down menu.
- Create the name of the connector as Weather transform and optionally provide a description.
- Select the Query Type based on the type of Transform you're building. In this part of the tutorial, we're going to use SQL code to do the cleanup so we'll be selecting SQL*. Alternatively, if you wanted to create a PySpark function you could do that here as well by selecting PySpark**.
- Selecting SQL will provide you with a simple SELECT statement as boilerplate, but we'll be adding editing this. To do that we'll be using the RAW BUILDER option which provides a rich editing experience and suggestions with autocomplete. Alternatively, you may use the QUERY BUILDER which offers a GUI-based, wizard-driven approach for those with novice SQL skills.
- Enter the following code into the RAW BUILDER editor.
SELECT ww.Date, ww.Weather, ww.Precipitation AS precipitation, TIMESTAMP(ww.Date) AS weather_date_ts FROM Weather AS ww WHERE LENGTH(TRIM(ww.Precipitation)) !=0
- Leave the remaining fields at their default values and click the CREATE button and Ascend will begin processing your SQL query immediately. When it's finished your Dataflow canvas should look like this.
- Preview and inspect the results of the Transform by clicking on it and then clicking on the Records tab to view the contents of them and looking at the Partitions tab (shown below) for information on column stats and job details like elapsed time, etc. If any errors are detected you can click on the Errors and Warnings tab to view them.
The final step in our Dataflow is to create a Data Feed. Data feeds are powerful constructs designed to allow Dataflows and Data Services to communicate live data with each other. What this means is you can create very complex Dataflow that outputs a single result table and share it with others in your organization (say in different data service) without having to give them access to your actual Dataflow. You connect to existing Data Feed Connectors to act as inputs into other Dataflows.
- To create the new Data Feed Connector click + ADD menu and select Data Feed and then Create to open the CREATE NEW DATA FEED panel (shown below). Alternatively, you can create the new Data Feed Connector by right-clicking the Weather Transform and selecting Create new data feed from the drop-down menu.
- Create the name of the Data Feed Connector as Weather Data Feed and optionally provide a description.
- Select **Weather Transform" from the UPSTREAM drop-down menu.
- For PERMISSION SETTINGS Leave the default "All teams in all Data Services can access this Data feed" selected. This will make it available to all users in our environment.
- Click CREATE and that’s it. You should see the Data Feed Connector on your Dataflow canvas (as shown below). You now have a dedicated data spigot that contains a very specific data set that I can allow others to subscribe to and build their Dataflows off.
Congratulations on creating your first orchestrated Dataflow with Ascend. Hopefully, you got a real good sense of the power in what Ascend can do for you - and that you can now build your own Dataflows. You've seen how easy it is to create Read Connectors, use SQL inside of Transforms and making those results available and reuseable as Data Feeds. Here's what your Dataflow should look like.
In the next tutorial, we'll use some of the same components here and combine it with some additional datasets to create a larger and more comprehensive Dataflow.
Integration with Tableau is possible through Ascend's Tableau Web Data Connector. You can now build Tableau workbooks that are directly linked to data feeds from Ascend! You will need Tableau installed in order to complete this portion of the tutorial.
Before you can connect to Tableau you will need to create a Service Account and generate the necessary keys with which to connect. Service accounts allow Data Service Admins to create accounts with API keys that have an extended lifetime and can be used to integrate with 3rd party tools. Go to the Service Accounts documentation for more information on creating and managing them.
- Launch Tableau and select Web Data Connector from the left-side menu under Connect -> To a Server. The web data connector web-page will pop-up.
- To load data feeds into Tableau Web Data Connector, paste link that is shown on the Service Accounts main page. Your Tableau URL (example below) will differ based on your individual environment.
- Hit enter and using the Access Keys you generated previously for this service account enter them into the Ascend Tableau Connector pop-up window as shown below.
- Click the PICK DATA FEEDS button and you'll be presented with all the data feeds you have access to within this data service.
- Select a Data Feed by clicking on it and then click the VISUALIZE IN TABLEAU button to begin using it in Tableau and performing any analysis on it.
Updated 3 months ago