Amazon S3
Learn the required and optional properties of creating an Amazon S3 Connection, Credential, Read Connector, and Write Connector.
Prerequisites
- Access Credentials
- Data location on AWS S3
- Data Schema (column names and column type)
Connection Properties
The following table describes the fields available when creating a new Amazon S3 Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only, Write-Only, or Read-Write. |
Connection Name | Required | Input your desired name. |
Bucket | Optional | The bucket name, such as ascend-io-sample-read-data . |
S3 API Endpoint | Optional | This is an endpoint that is not S3 but S3 compatible. |
S3 Region | Optional | The region name, such as us-west-2 . |
Requires Credentials | Optional | Check this box to create a new credential or select an existing credential. |
Credential Properties
The following table describes the fields available when creating a new Amazon S3 credential.
Field Name | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with . |
AWS Access Key ID | Required | The access key ID for your account. |
AWS Secret Access Key | Required | The secret access key for your account. |
Read Connector Properties
The following table describes the fields available when creating a new Amazon S3 Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Bucket | Required | Name of the S3 bucket. |
Object Pattern Matching | Required | The pattern strategy used to identify eligible files: - Glob: Glob applies a simple pattern matching algorithm. - Match: Matches the pattern precisely character-for-character. - Prefix: Matches the pattern with the specified prefix. - Regex: Regex applies a pattern matching algorithm. |
Parser | Required | We support several data formats. See Blob Storage Read Connector Parsers for more information about CSV, Excel, JSON, and Python parsers: - Avro - CSV - Excel - JSON - ORC - Parquet - Python - Text |
Path Delimiter | Optional | Example: A newline \n de-limited file |
Object Aggregation Strategy | Required | Currently available strategies are: - Adaptive - Leaf Directory - Prefix Regex Match |
Data Replication Strategy | Optional | Defines the data replication strategy. Default unselected replicates all source changes. See Replication Strategies for Blob Store. |
Write Connector Properties
The following table describes the fields available when creating a new [connection type name] Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.
Browse and Select Data lets you select a location within an S3 bucket to write data. This will automatically file in the Bucket and Output Directory fields.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Upstream | Required | The name of the previous connector the Write Connector will pull data from. |
Bucket | Required | Bucket that the output data will be written to. |
Output Directory | Required | Directory within bucket to write the data. If the folder does not exist, it will be created. |
Partition Interpolation Template | Optional | Include a value from the partition profile as part of the output directory naming. For example, to create Hive style partitioning on dataset daily partitioned on timestamp event_ts , specify the pattern as dt={{event_ts(yyyy-MM-dd)}}/ . |
Output File Syntax | Optional | A suffix to attach to each file name. By default, Ascend will include the extension of the file format, but you may optionally choose a different suffix. |
Format | Required | We support several data formats. See Amazon S3 Write Connector Connectors for more information about CSV, Excel, JSON, and Python parsers: - Avro - CSV - JSON - ORC - Parquet - Text |
Path Delimiter | Optional | Example: A newline \n de-limited file |
Manifest File | Optional | Specify a manifest file which will be updated with the list of files every time they are updated. |
Write Strategy | Optional | Pick the strategy for writing files in the storage: - Default (Mirror to Blob Store): this strategy allows to keep the storage aligned with ascend. allows inserting, updating and deleting partitions on the blob store. - Ascend Upsert Partitions: This strategy allows for appending new partitions in Ascend and updating existing partitions, without deleting partitions from blob store that are no longer in Ascend. - Custom Function: This strategy allows you to implement the write logic that'll be executed by Ascend. |
Updated 7 months ago