Google Cloud Storage
Learn the required and optional properties of creating a Google Cloud Storage (GCS) Connection, Credential, Read Connector, and Write Connector.
Prerequisites
- Access credentials
- Data location on Google Cloud Storage
- Data Schema (column names and column type)
Connection Properties
The following table describes the fields available when creating a new Google Cloud Storage Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only, Write-Only, or Read-Write. |
Connection Name | Required | Input your desired name. |
Project | Required | The name to identify the project. |
Bucket | Optional | Google Cloud Storage bucket where the data is located, such as bucket-io-gcs . |
Requires Credentials | Optional | Check this box to create a new credential or select an existing credential. |
Credential Properties
The following table describes the fields available when creating a new Google Cloud Storage credential.
Field Name | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with Google Cloud Storage . |
Google Cloud Credentials | Required | Private key used to identify the Google Cloud Storage account. When a Service Account is created, a private key is produced to provide authentication between Google Cloud and third party platforms. The private key is a block of JSON. Provide the entire key which includes the type, project_id, private_key_id, private_key, client_email, client_id, auth_uri, token_uri, auth_provider_x509_cert_url and client_x509_cert_url. Keep in mind that Google Service Accounts are bound to a project. Each Google project will require a new credential object. |
Google Cloud Storage Credentials for Write Connection
Enter the JSON key for the service account which has Storage Admin and Storage Object Admin for the GCS path. Refer to Google documentation for more details on GCS authentication.
Read Connector Properties
The following table describes the fields available when creating a new Google Cloud Storage Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Bucket | Required | Name of the GCS bucket. |
Object Pattern Matching | Required | The pattern strategy used to identify eligible files: - Glob: Glob applies a simple pattern matching algorithm. - Match: Matches the pattern precisely character-for-character. - Prefix: Matches the pattern with the specified prefix. - Regex: Regex applies a pattern matching algorithm. |
Parser | Required | We support several data formats. See Blob Storage Read Connector Parsers for more information about CSV, Excel, JSON, and Python parsers: - Avro - CSV - Excel - JSON - ORC - Parquet - Python - Text |
Path Delimiter | Optional | Example: A newline \n de-limited file |
Object Aggregation Strategy | Optional | Currently available strategies are: - Adaptive - Leaf Directory - Prefix Regex Match |
Regex to Match Files in Zip File | Optional | Defaults to no filtering. Only file names matching the pattern will be extracted from ZIP file. |
Data Replication Strategy | Optional | Defines the data replication strategy. Default unselected replicates all source changes. See Replication Strategies for Blob Store. |
Write Connector Properties
The following table describes the fields available when creating a new Google Cloud Storage Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.
Storage API
We relied on the Google Storage API to write into GCS locations. As a result, we require the Storage API to be enabled. This should be enabled by default in the GCP but in case it's not, user can enable it by going to https://console.developers.google.com/apis/api/storage-api.googleapis.com/overview?project={your gcp id} to enable this API.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Upstream | Required | The name of the previous connector the Write Connector will pull data from. |
Bucket | Required | Bucket that the output data will be written to. |
Output Directory | Required | Directory within bucket to write the data. If the folder does not exist, it will be created. |
Partition Interpolation Template | Optional | Include a value from the partition profile as part of the output directory naming. For example, to create Hive style partitioning on dataset daily partitioned on timestamp event_ts , specify the pattern as dt={{event_ts(yyyy-MM-dd)}}/ . |
Output File Syntax | Optional | A suffix to attach to each file name. By default, Ascend will include the extension of the file format, but you may optionally choose a different suffix. |
Format | Required | We support several data formats. See Blob Storage Read Connector Parsers for more information about CSV, Excel, JSON, and Python parsers: - Avro - CSV - JSON - ORC - Parquet - Text |
Path Delimiter | Optional | Example: A newline \n de-limited file |
Manifest File | Optional | Specify a manifest file which will be updated with the list of files every time they are updated. |
Write Strategy | Optional | Pick the strategy for writing files in the storage: - Default (Mirror to Blob Store): this strategy allows to keep the storage aligned with ascend. allows inserting, updating and deleting partitions on the blob store. - Ascend Upsert Partitions: This strategy allows for appending new partitions in Ascend and updating existing partitions, without deleting partitions from blob store that are no longer in Ascend. - Custom Function: This strategy allows you to implement the write logic that'll be executed by Ascend. |
Updated 9 months ago