Create a Taxi Domain
Learn how to ingest, transform, and share your data by creating your first data product, a Taxi Domain.
Step 1: Creating Read Connectors
The first step for creating a data product is creating Connections and Read Connectors that access the weather and taxi cab data from AWS S3.
Create Sample Datasets Connection
On the upper left side of the screen, select CREATE. Then select Read Connector to bring up the Connectors Panel. From the Connectors Panel, select New Connection and you'll see a list of all possible pre-built connectors to choose from.
For today's lab, we're connecting to an AWS S3 source. Select the Amazon S3 connector from the Connectors Panel and enter the following parameters:
Field | Value |
---|---|
Access Type | Read-Only |
Connection Name | Sample Datasets |
Requires Credentials | Leave unchecked. The data we are using does not require credentials. |
ascend-io-sample-data-read
Select Test Connection to confirm all S3 bucket settings and permissions are configured correctly.
Select CREATE to save this connection for future use.
Once your connection is saved, you will see the S3 connection listed below when you choose Read Connector on the left panel.
Create a GREEN_CABS Read Connector
Select USE on the Sample Datasets S3 Read Connector and select Skip and configure manually. Enter the following parameters:
GREEN_CABS
Field | Value |
---|---|
OBJECT PATTERN MATCHING | Regex |
OBJECT PATTERN | NYC-TAXI/green_cab/2016-0[1-3]/.*.csv |
PARSER | CSV |
FILES HAVE A HEADER ROW | Check this box. |
Leave the remaining fields unconfigured. Select GENERATE SCHEMA to see a preview of the CSV to confirm everything looks accurate. Ascend will automatically parse the header along with the first few lines of data to generate the schema and allow you to inspect it.
Once schema is generated, select CREATE. Ascend will begin ingesting the green cab data from the S3 bucket. Once the data is ingested, the Read Connector will show UP TO DATE
.
Create a YELLOW_CABS Read Connector
Repeat the same process for the GREEN_CABS
read connector with the following parameters:
YELLOW_CABS
Field | Value |
---|---|
OBJECT PATTERN MATCHING | Prefix |
OBJECT PATTERN | NYC-TAXI/yellow_cab/2016-01/ |
PARSER | CSV |
FILES HAVE A HEADER ROW | Check this box. |
MODE | PERMISSIVE |
As before, leave the remaining fields unconfigured. Don't forget to Generate Schema before selecting CREATE!
Step 2: Creating Transforms
After creating read connectors, we'll create a Transform using Snowflake SQL. The All_CAB_DATA
Transform combines the data in each of the Read Connectors created in Step 1. The CONTROL_CAB_DATA
takes the combined cab data, and converts the pick-up date and time to a DATETIME value.
Create a ALL_CAB_DATA Transform
From the YELLOW_CABS
transform, left-select the vertical ellipsis and select Create new Transform. Alternatively, select Snowflake SQL from the Transform menu located in the toolbox.
Name the transform "ALL_CAB_DATA" and paste the following code into the query editor.
ALL_CAB_DATA
SELECT
VendorID as vendor_id,
lpep_pickup_datetime as pickup_datetime,
TO_TIMESTAMP(Lpep_dropoff_datetime) as dropoff_datetime,
RateCodeID as rate_code_id,
Pickup_longitude as pickup_lon,
Pickup_latitude as pickup_lat,
Dropoff_longitude as dropoff_lon,
Dropoff_latitude as dropoff_lat,
Passenger_count as passenger_count,
Trip_distance as distance,
Fare_amount as amount_fare,
Extra as amount_extra,
MTA_tax as amount_tax,
Tip_amount as amount_tip,
Tolls_amount as amount_tolls,
IFNULL(TRY_TO_DECIMAL(Ehail_fee,10,2),TO_DECIMAL(0.0,10,2)) as amount_hail_fee,
improvement_surcharge as amount_surcharge,
Total_amount as amount_total,
Payment_type as payment_type,
Trip_type_ as trip_type,
'green' as cab_color
FROM {{GREEN_CABS}}
UNION ALL
SELECT
VendorID as vendor_id,
tpep_pickup_datetime as pickup_datetime,
TO_TIMESTAMP(tpep_dropoff_datetime) as dropoff_datetime,
RatecodeID as rate_code_id,
pickup_longitude as pickup_lon,
pickup_latitude as pickup_lat,
dropoff_longitude as dropoff_lon,
dropoff_latitude as dropoff_lat,
passenger_count as passenger_count,
trip_distance as distance,
fare_amount as amount_fare,
extra as amount_extra,
mta_tax as amount_tax,
tip_amount as amount_tip,
tolls_amount as amount_tolls,
TO_DECIMAL(0.0, 10, 2) as amount_hail_fee,
improvement_surcharge as amount_surcharge,
Total_amount_ as amount_total,
payment_type as payment_type,
-1 as trip_type,
'yellow' as cab_color
FROM {{YELLOW_CABS}}
SELECT
VendorID as vendor_id,
lpep_pickup_datetime as pickup_datetime,
TO_TIMESTAMP(Lpep_dropoff_datetime) as dropoff_datetime,
RateCodeID as rate_code_id,
Pickup_longitude as pickup_lon,
Pickup_latitude as pickup_lat,
Dropoff_longitude as dropoff_lon,
Dropoff_latitude as dropoff_lat,
Passenger_count as passenger_count,
Trip_distance as distance,
Fare_amount as amount_fare,
Extra as amount_extra,
MTA_tax as amount_tax,
Tip_amount as amount_tip,
Tolls_amount as amount_tolls,
IFNULL(TRY_TO_NUMBER(Ehail_fee,'S$999,099.99'), TO_NUMBER(0.0, '99.99')) as amount_hail_fee,
improvement_surcharge as amount_surcharge,
Total_amount as amount_total,
Payment_type as payment_type,
Trip_type_ as trip_type,
'green' as cab_color
FROM {{GREEN_CABS}}
UNION ALL
SELECT
VendorID as vendor_id,
tpep_pickup_datetime as pickup_datetime,
TO_TIMESTAMP(tpep_dropoff_datetime) as dropoff_datetime,
RatecodeID as rate_code_id,
pickup_longitude as pickup_lon,
pickup_latitude as pickup_lat,
dropoff_longitude as dropoff_lon,
dropoff_latitude as dropoff_lat,
passenger_count as passenger_count,
trip_distance as distance,
fare_amount as amount_fare,
extra as amount_extra,
mta_tax as amount_tax,
tip_amount as amount_tip,
tolls_amount as amount_tolls,
TO_NUMBER(0.0, '99.99') as amount_hail_fee,
improvement_surcharge as amount_surcharge,
Total_amount_ as amount_total,
payment_type as payment_type,
-1 as trip_type,
'yellow' as cab_color
FROM {{YELLOW_CABS}}
SELECT
VendorID as vendor_id,
lpep_pickup_datetime as pickup_datetime,
TIMESTAMP(Lpep_dropoff_datetime) as dropoff_datetime,
RateCodeID as rate_code_id,
Pickup_longitude as pickup_lon,
Pickup_latitude as pickup_lat,
Dropoff_longitude as dropoff_lon,
Dropoff_latitude as dropoff_lat,
Passenger_count as passenger_count,
Trip_distance as distance,
Fare_amount as amount_fare,
Extra as amount_extra,
MTA_tax as amount_tax,
Tip_amount as amount_tip,
Tolls_amount as amount_tolls,
IFNULL(SAFE_CAST(Ehail_fee AS FLOAT64),0.0) as amount_hail_fee,
improvement_surcharge as amount_surcharge,
Total_amount as amount_total,
Payment_type as payment_type,
Trip_type as trip_type,
'green' as cab_color
FROM {{GREEN_CABS}}
UNION ALL
SELECT
VendorID as vendor_id,
tpep_pickup_datetime as pickup_datetime,
TIMESTAMP(tpep_dropoff_datetime) as dropoff_datetime,
RatecodeID as rate_code_id,
pickup_longitude as pickup_lon,
pickup_latitude as pickup_lat,
dropoff_longitude as dropoff_lon,
dropoff_latitude as dropoff_lat,
passenger_count as passenger_count,
trip_distance as distance,
fare_amount as amount_fare,
extra as amount_extra,
mta_tax as amount_tax,
tip_amount as amount_tip,
tolls_amount as amount_tolls,
0.0 as amount_hail_fee,
improvement_surcharge as amount_surcharge,
Total_amount as amount_total,
payment_type as payment_type,
-1 as trip_type,
'yellow' as cab_color
FROM {{YELLOW_CABS}}
Set Component Pausing to Running and leave Advanced Settings unconfigured.
Select CREATE and Ascend will begin processing your SQL query immediately.
Create CONTROL_CAB_DATA Transform
From the ALL_CAB_DATA
transform, left-select the vertical ellipsis and select Create new Transform.
Name the transform CONTROL_CAB_DATA
paste the following code in the query editor:
CONTROL_CAB_DATA
SELECT *,
TO_DATE(p.PICKUP_DATETIME) AS RIDE_DATE, 'new' as jto_col
FROM {{ALL_CAB_DATA}} AS p
SELECT *,
TO_DATE(p.PICKUP_DATETIME) AS RIDE_DATE, 'new' as jto_col
FROM {{ALL_CAB_DATA}} AS p
SELECT *,
DATE(p.PICKUP_DATETIME) AS RIDE_DATE, 'new' as jto_col
FROM {{ALL_CAB_DATA}} AS p
Step 3: Creating Data Feed Connectors
The final step in our Taxi Domain is to create a Data Feed.
Create a MESH_DOMAIN_TAXI_DATA Data Feed
Right-select the vertical ellipsis from the CONTROL_CAB_DATA
Transform and select Create New Data Feed.
Name the Data Feed MESH_DOMAIN_TAXI_DATA
. Ensure the Upstream field is set to CONTROL_CAB_DATA
. For Permission Settings, select "All teams in all Data Services...."
MESH_DOMAIN_TAXI_DATA
Your Data Domain should look like this:
Next Steps
Congratulations on creating a taxi data domain! You've ingested taxi data, transformed the data so the schema from two different sources aligns, and created a Data Feed. You're well on your way to creating your first Data Mesh. Next, continue with the Create a Weather Domain guide.
Updated 10 months ago