Snowflake Write Connector (Legacy)
Creating a Snowflake Write Connector
Prerequisites:
- Access credentials for a Snowflake account
- Access credentials for an AWS account
- Snowflake account name
- Snowflake data warehouse name
- Snowflake database name
- Snowflake table name
- AWS S3 bucket with write access
Specify the Snowflake Account, Warehouse, Database, Table Name, Schema, and Credentials information.
Suggestion
For optimal performance ensure the upstream transform is partitioned.
Snowflake connection details
Specify the Snowflake connection details. These include how to connect to Snowflake as well as where data resides within Snowflake.
- ACCOUNT: The Snowflake account to be accessed.
- WAREHOUSE: The name of the Snowflake Warehouse where the Database resides.
- DATABASE: The name of the Snowflake Database to be accessed.
- TABLE NAME: The name of the Snowflake table in which data will get pushed. This table does not need to be created manually before creating the Write Connector. The Write Connector will automatically create this table at runtime.
- SCHEMA: Indicate schema information (e.g. PUBLIC)
- CREDENTIALS: The username/password for your Snowflake account.
Testing Snowflake permissions
Use Test Connection to check whether all Snowflake account permissions are correctly configured.
AWS S3 staging data location
AWS S3 is used as a staging area prior to pushing data into Snowflake. This requires configuring an AWS S3 bucket with proper S3 credentials. Specify under the Intermediate S3 Store Information section the S3 path where the data files will be created.
- Bucket: The S3 bucket name, e.g. my_data_bucket. Do not put in any folder information in this box.
- Object Prefix: The S3 folders hierarchical prefix, e.g. good_data/my_awesome_table. Do not include any leading forward slashes.
- Partition Folder Pattern: The folder pattern that will be generated based on the values from the upstream partition column, e.g. you can use {{at_hour_ts(yyyy-MM-dd/HH)}} where column at_hour_ts is a partition column from the upstream transform.
- Enter the Access Key and Secret Key for the IAM User which has write access to the path specified. Here's an example policy for writing into the s3 location s3://ascend-io-playground-bucket/write.
Testing AWS permissions
Use Test Connection to check whether all AWS permissions are correctly configured.
Warning
Any data that is not being propagated by the upstream transform will automatically be deleted under the object prefix.
For example; if the write connector produces three files A, B and C in the object-prefix and there was an existing file called output.txt at the same location Ascend will delete output.txt since Ascend did not generate it.
Schema Mismatch
You can select from 3 update strategies for scenarios where the schema of the upstream transform changes.
- Stop and display error: Will stop propagating new data to the Snowflake table and raise an error in the UI.
- Recreate table: Delete the old Snowflake table and create a new table that reflects the new schema. Please note: Any manual changes to the old table outside of the Ascend environment, such as table-specific permissions, will be lost as the newly created table won't mirror the changes from the old table.
- Alter table: Send an alter table command to Snowflake in an attempt to update the schema without deleting the original Snowflake table such that any changes to the table outside of the Ascend (e.g. user permission, views, etc.) can be preserved. If Alter columns on type mismatch is checked, the alter table command will attempt to change the data type of the column as well. If Drop unknown columns is checked, then any columns added to the table manually outside of the Ascend environment will be removed as a result of the update. The default is both options checked.
Important
- When Alter table option is selected, and the schema of the upstream transform has changed, all rows in the table will be deleted, and new rows will be generated.
- It may not always be possible to alter a table to the desired schema, and If so, an error will be generated to indicate this.
- Additional metadata columns are added automatically to the table schema (prefixed with ASCEND_)
- Snowflake sinks directly downstream of sources currently do not work.
- The maximum concurrent writes on 1 table is 20, which is the snowflake default.
Updated 11 months ago