Google BigQuery
Learn the required and optional properties of creating a Google BigQuery Connection, Credential, Read Connector, and Write Connector.
Prerequisites
- Access credentials
- Google Cloud project name
- BigQuery dataset and table names
- Data Schema (column names and column type)
Connection Properties
The following table describes the fields available when creating a new BigQuery Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only, Write-Only, or Read-Write. |
Connection Name | Required | Input your desired name. |
Project | Required | Project name in Google BigQuery where the desired data is located. |
Dataset | Optional | Dataset within the Google project. Use the name as it appears below the project in Google BigQuery. For ex., use rain_data , not united-states-weather:rain_data . |
Location | Optional | Select the location to restrict the execution of the data to either US or EU. Execution is not restricted by default. |
Requires Credentials | Required | This box is automatically selected. |
Credential Properties
The following table describes the fields available when creating a new BigQuery credential.
Field | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with Google BigQuery . |
Google Cloud Credentials | Required | Private key used to identify the Google account. When a Service Account is created, a private key is produced to provide authentication between Google Cloud and third party platforms. The private key will be a block of JSON, provide the entire key which includes the type, project_id, private_key_id, private_key, client_email, client_id, auth_uri, token_uri, auth_provider_x509_cert_url and client_x509_cert_url. Keep in mind that Google Service Accounts are bound to a project. Each Google project will require a new credential object. |
Read Connector Properties
The following table describes the fields available when creating a new BigQuery Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Dataset | Required | Dataset within a Google BigQuery project. |
Partitioning | Required | How you would like Ascend to partition the data. See Partitioning Strategies table below. |
Table Name | Required | Name of the table to be ingested. |
Partitioning Strategies
The following table contains the configurations that are partition specific. Additional information on Google BigQuery Partitioning.
Partition Style | Description | Required Fields |
---|---|---|
None | No partition is used and the data resides in one large table. | TABLE NAME: Name used to identify the BigQuery table. |
Partitioned Table | A table that is separated into sections such as Ingestion time or Date. Google identifies partitioning as a best practice over sharding. | TABLE NAME: Name used to identify the BigQuery table. PARTITION PATTERN MATCHING: Select either Regex, Glob, or Match. PARTITION PATTERN: Pattern used to identify the partition. |
Sharded Table | Another way to sub section data. A subset of data can be pulled from multiple database tables based on a parameter such as Date. | TABLE NAME PATTERN MATCHING: Select either Regex, Glob, or Match. TABLE NAME PATTERN: Pattern used to identify the table. |
Write Connector Properties
The following table describes the fields available when creating a new Google BigQuery Write Connector. Create a new Write Connector using the information below and these step-by-step instructions.
Field Name | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Upstream | Required | The name of the previous connector the Write Connector will pull data from. |
Output Table | Required | Directory within dataset to write the data. If the table does not exist, it will be created. |
Partition Field Name | Optional | Field name with which to divide the data into partitions. |
Partition Type | Optional | Supported types are: HOUR, DAY, MONTH, YEAR. Defaults to DAY if Partition Field is specified. Not supported by the DIRECT write method. |
Clustering Fields | Optional | Comma-separated list of clustering columns for creating clustered table. |
On Schema Mismatch | Optional | Options are as follows: - Skip schema check - Stop and display error - Recreate table - Alter table |
A SQL Statement for Ascend to Execute Before Writing | Optional | Here, the user has the option to execute a pre-processing script before writing to final table. |
A SQL Statement for Ascend to Execute After Writing | Optional | Here, the user has the option to execute a pre-processing script after writing to final table. |
Write Strategy | Optional | Full table write: Default write strategy Partition Write Using Ascend Partition Id: Use Ascend partitions when writing out the target BigQuery table. New partitions are inserted into the target table, and partitions that no longer appear in Ascend are deleted from the target table. This also handles changed partitions, because changed partitions have a new ID, so the old version of a changed partition is deleted and a new version is added automatically (similar to a "partition swap"). Partition Append: requires both partition field and partition type to be set. Ascend creates a target BigQuery table with partitions. After finding the largest value in the partition field for data already written, Ascend filters new partitions so that they only contain rows with partition field values greater than what is already present in the target table. Ascend only appends new partitions that contain rows which have a partition field value greater than what is already present in the target table. Merge By Partition: requires both partition field and partition type to be set. Ascend creates a target BigQuery table with partitions, Ascend compares all partition IDs that exist in the target table to the ones in Ascend. New partitions from Ascend are inserted. If an existing partition in the target matches a partition in Ascend, Ascend compares the record count of the partition in target table vs. Ascend. If the record count differs, the partition in the target table is overwritten. |
Changing Write Strategy
After you set a Write Strategy for a BigQuery Write Connector, it cannot be changed after creating the component. If you want to change the Write Strategy, you'll need to create a new Write Connector.
Updated 10 months ago