Apache Iceberg
Learn the required and optional properties of creating an Apache Iceberg Connection, Credential, Read Connector, and Write Connector.
Prerequisites
- Access to Storage System (Amazon S3, Azure Data Lake Storage, or Google Cloud Storage)
- Credentials for the above Storage System
Connection Properties
The following table describes the fields available when creating a new Iceberg Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only, Write-Only, or Read-Write. |
Connection Name | Required | Input your desired name. |
Storage System | Required | Amazon S3, Azure Data Lake Storage, or Google Cloud Storage. Your selection will determine which of the following fields is required. See the below tables for each storage system type. |
Path to Root Directory of Catalog | Required | Where to store the Iceberg Data. Include a path to a folder for Ascend to access. For example, ascend-iceberg-data-folder . By default, data will be stored at the root directory. |
Requires Credentials | Required | Creating a new Iceberg connection requires a new credential or existing credential. |
Storage System
Apache Iceberg requires an underlying storage system. The below tables detail the parameters required for Amazon S3, Azure Data Lake Storage, and Google Cloud Storage.
Spark with Iceberg Data Plane Configuration
If you are configuring an Iceberg Connection for Spark with Iceberg Data Plane, your storage system will contain the metadata used within Ascend. Your storage system must be existing prior to configuration. Specific storage system parameters are unique to your business needs, however Ascend does not require a specific size of your storage bucket, project, or service.
Amazon S3 Bucket
Field | Required | Description |
---|---|---|
Amazon S3 Bucket | Required | Input the bucket your data will be stored. |
Amazon S3 Connection Type | Optional | Your options are Standard, With Region, AWS PrivateLink, or Custom Endpoint. |
S3 Region | Optional | Provide the S3 regional endpoint you want to use. |
AWS PrivateLink | Optional | Provide a custom endpoint to AWS PrivateLink. |
Custom Endpoint | Optional | Provide the custom endpoint. Select Disable Certificate Verification if needed. |
Azure Data Lake Storage
Field | Required | Description |
---|---|---|
Storage Account Name | Required | The name of the storage account. This can be found in Azure Resource Manager. |
Container Name | Required | The Azure container instance name. |
Disable Soft Deletes
Ascend requires that soft deletes are disabled on the Azure storage account you are using to connect.
Google Cloud Storage
Field | Required | Description |
---|---|---|
Project | Required | Indicate an existing project. |
Bucket | Optional | Indicate an existing bucket or define a new bucket name. |
Credential Properties
The following table describes the fields available when creating a new Apache Iceberg credential.
Field | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with {connection type name} . |
Iceberg Credential | Required | S3 Credential: Include an AWS Access Key ID and AWS Secret Access Key. Azure Credential: Select which of the three Azure credential options you want to use and provide the necessary information. Google Credential: Provide Google Cloud credentials in JSON format. |
Read Connector Properties
The following table describes the fields available when creating a new Apache Iceberg Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Table | Required | The table that you are reading from. For example, blue, green, red , |
Database Namespace | Optional | How tables are categorized together. For example, the above tables would have a namespace of color . |
Write Connector Properties
The following table describes the fields available when creating a new Apache Iceberg Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Upstream | Required | The upstream component containing the data to write. |
Table | Required | The table that you are reading from. For example, blue, green, red , |
Database Namespace | Optional | How tables are categorized together. For example, the above tables would have a namespace of color . |
Partition Clause (partitioned by...) | Optional | Table column reference for [dynamic overwrite.](dynamic overwrite mode is recommended when writing to Iceberg tables) |
Updated 9 months ago