Azure Data Lake
Prerequisites
- Microsoft Account
- A Storage Account to use with Azure Data Lake Storage Gen2
- A container inside of your Azure Storage Account
Connection Properties
The following table describes the fields available when creating a new Azure Data Lake Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only, Write-Only, or Read-Write. |
Connection Name | Required | Input your desired name. |
Storage Account Name | Required | The name of the your ALDS storage account resource in Azure. |
Container Name | Required | The name of your container in Azure. |
Requires Credentials | Optional | Check this box to create a new credential or select an existing credential. |
Credential Properties
The following table describes the fields available when creating a new Azure Data Lakecredential.
Field Name | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with Azure Data Lake . |
Credential Type | Required | The type of credential you want to use. - Azure Shared Key: Provide your Azured shared key in the field that appears. - Azure AD Service Principal - Azure AD Service JSON: Provide credentials in JSON format. |
Read Connector Properties
The following table describes the fields available when creating a new Azure Data Lake Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Object Pattern Matching | Required | This is a description of the field. |
Container Name | Required | This is the name of our container in Azure. In Azure the name of our container in Azure is ascend-data . |
Object Pattern | Required | This is the name of the data file that is in your container. Example: AirPassengers.csv . |
Parser | Required | We support several data formats. See Blob Storage Read Connector Parsers for more information about CSV, Excel, JSON, and Python parsers: - Avro - CSV - Excel - JSON - ORC - Parquet - Python - Text |
Path Delimiter | Optional | Example: A newline \n de-limited file. |
Write Connector Properties
The following table describes the fields available when creating a new [connection type name] Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.
Container Brows and Select
Browse and Select Data lets you select a location within an Azure container to write data. This will automatically file in the Container and Output Directory fields.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Upstream | Required | The name of the previous connector the Write Connector will pull data from. |
Container Name | Optional | The container the Ascend will write data to. |
Output Directory | Required | The output prefix to write files to. Use the format data/gold/aggregated_cab_rides .Note: This prefix must be unique across all Write Connectors or Ascend will delete any existing objects in the directory. |
Partition Interpolation Template | Optional | Include a value from the partition profile as part of the output directory naming. For example, to create Hive style partitioning on dataset daily partitioned on timestamp event_ts , specify the pattern as dt={{event_ts(yyyy-MM-dd)}}/ . |
Output File Syntax | Optional | A suffix to attach to each file name. By default, Ascend will include the extension of the file format, but you may optionally choose a different suffix. |
Format | Required | We support several data formats. See Amazon S3 Write Connector Connectors for more information about CSV, Excel, JSON, and Python parsers: - Avro - CSV - JSON - ORC - Parquet - Text |
Path Delimiter | Optional | Example: A newline \n de-limited file. |
Manifest File | Optional | Specify a manifest file which will be updated with the list of files every time they are updated. |
Write Strategy | Optional | Pick the strategy for writing files in the storage: - Default (Mirror to Blob Store): this strategy allows to keep the storage aligned with ascend. allows inserting, updating and deleting partitions on the blob store. - Ascend Upsert Partitions: This strategy allows for appending new partitions in Ascend and updating existing partitions, without deleting partitions from blob store that are no longer in Ascend. - Custom Function: This strategy allows you to implement the write logic that'll be executed by Ascend. |
Updated 11 months ago