Delta Lake
Learn the required and optional properties of creating a Delta Lake Connection, Credential, Read Connector, and Write Connector.
Prerequisites
- Object Storage and Cloud provider credentials (e.g: S3 and AWS)
- Table name (Stored in S3 or Azure)
Connection Properties
The following table describes the fields available when creating a new Delta Lake Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only, Write-Only, or Read-Write. |
Connection Name | Required | Input your desired name. |
Storage System | Required | The cloud object storage where Delta Lakes stores its tables. See below for more information on each storage System: - Amazon S3 - Azure Data Lake Storage - Google Cloud Storage |
Requires Credentials | Required | Choose from existing credentials or create new credential for connecting to Delta Lake if 'Requires Credentials' checkbox is selected. |
Amazon S3
When selecting Amazon S3 as the Storage System the below additional fields appear:
Field | Required | Description |
---|---|---|
Bucket | Optional | Name of the S3 bucket. |
Connection Type | Optional | Refers to the Amazon S3 connection type. - Standard - With Region: Requires input of the S3 region. - AWS PrivateLink for Amazon S3: Requires a custom endpoint for the private link. - Custom Endpoint: Requires input of the custom endpoint and allows you to disable certificate verification |
Azure Data Lake Storage
When selecting Azure Data Lake Storage as the Storage System the below additional fields appear:
Field | Required | Description |
---|---|---|
Storage Account Name | Required | Name of your Azure Data Lake accounts. |
Container Name | Required | The container name for your data. |
Google Cloud Storage
When selecting Google Cloud Storage as the Storage System the below additional fields appear:
Field | Required | Description |
---|---|---|
Project | Required | The project name of the data from GCS to read in through Delta Lake. |
Bucket | Optional | The name of the bucket the data is stored in. |
Credential Properties
The following table describes the fields available when creating a new Delta Lake credential.
Field Name | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with Delta Lake . |
Data Lake Credentials | Required | Choose either S3, Azure, or Google. Each credential type has different fields specific to that type. |
Read Connector Properties
The following table describes the fields available when creating a new Delta Lake Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Write Connector Properties
The following table describes the fields available when creating a newDelta Lake Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Upstream | Required | The name of the previous connector the Write Connector will pull data from. |
Delta Table Path | Required | The path to the table you want to write data to. |
Storage System Options | Optional | If the storage system is different than the connection, but shares the same credentials, you can override the storage system parameters created in the connection. |
Partition Columns | Optional | Column names used to partition data in Delta Lake tables, separated by comma. These columns help in dividing the data into different segments, improving query performance. |
Z-Order Columns | Optional | Column names used for Z-ordering in Delta Lake, separated by comma. Z-ordering is a technique used to optimize data layout within Delta Lake for more efficient queries, especially for large, complex datasets. It involves ordering the data in a way that spatially collocates related information. |
Updated 10 months ago