Delta Lake

Learn the required and optional properties of creating a Delta Lake Connection, Credential, Read Connector, and Write Connector.

Delta Lake

Prerequisites

  • Object Storage and Cloud provider credentials (e.g: S3 and AWS)
  • Table name (Stored in S3 or Azure)

Connection Properties

The following table describes the fields available when creating a new Delta Lake Connection. Create a connection using the information below and these step-by-step instructions.

FieldRequiredDescription
Access TypeRequiredThis connection type is Read-Only, Write-Only, or Read-Write.
Connection NameRequiredInput your desired name.
Storage SystemRequiredThe cloud object storage where Delta Lakes stores its tables. See below for more information on each storage System:
- Amazon S3
- Azure Data Lake Storage
- Google Cloud Storage
Requires CredentialsRequiredChoose from existing credentials or create new credential for connecting to Delta Lake if 'Requires Credentials' checkbox is selected.

Amazon S3

When selecting Amazon S3 as the Storage System the below additional fields appear:

FieldRequiredDescription
BucketOptionalName of the S3 bucket.
Connection TypeOptionalRefers to the Amazon S3 connection type.
- Standard
- With Region: Requires input of the S3 region.
- AWS PrivateLink for Amazon S3: Requires a custom endpoint for the private link.
- Custom Endpoint: Requires input of the custom endpoint and allows you to disable certificate verification

Azure Data Lake Storage

When selecting Azure Data Lake Storage as the Storage System the below additional fields appear:

FieldRequiredDescription
Storage Account NameRequiredName of your Azure Data Lake accounts.
Container NameRequiredThe container name for your data.

Google Cloud Storage

When selecting Google Cloud Storage as the Storage System the below additional fields appear:

FieldRequiredDescription
ProjectRequiredThe project name of the data from GCS to read in through Delta Lake.
BucketOptionalThe name of the bucket the data is stored in.

Credential Properties

The following table describes the fields available when creating a new Delta Lake credential.

Field NameRequiredDescription
Credential NameRequiredThe name to identify this credential with. This credential will be available as a selection for future use.
Credential TypeRequiredThis field will automatically populate with Delta Lake.
Data Lake CredentialsRequiredChoose either S3, Azure, or Google. Each credential type has different fields specific to that type.

Read Connector Properties

The following table describes the fields available when creating a new Delta Lake Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.

Field NameRequiredDescription
NameRequiredProvide a name for your connector. We recommend using lowercase with underscores in place of spaces.
DescriptionOptionalDescribes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons.

Write Connector Properties

The following table describes the fields available when creating a newDelta Lake Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.

Field NameRequiredDescription
NameRequiredProvide a name for your connector. We recommend using lowercase with underscores in place of spaces.
DescriptionOptionalDescribes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons.
UpstreamRequiredThe name of the previous connector the Write Connector will pull data from.
Delta Table PathRequiredThe path to the table you want to write data to.
Storage System OptionsOptionalIf the storage system is different than the connection, but shares the same credentials, you can override the storage system parameters created in the connection.
Partition ColumnsOptionalColumn names used to partition data in Delta Lake tables, separated by comma. These columns help in dividing the data into different segments, improving query performance.
Z-Order ColumnsOptionalColumn names used for Z-ordering in Delta Lake, separated by comma. Z-ordering is a technique used to optimize data layout within Delta Lake for more efficient queries, especially for large, complex datasets. It involves ordering the data in a way that spatially collocates related information.

© Ascension Labs Inc. | All Rights Reserved