Google Cloud Pub/Sub
Learn the required and optional properties of creating a Google Cloud Pub/Sub Connection, Credential, Read Connector, and Write Connector.
Prerequisites
- Access credentials with Pub/Sub Editor Role
- Google Cloud Project
- Google Cloud Pub/Sub Topic to subscribe to
Connection Properties
The following table describes the fields available when creating a new Google Cloud Pub/Sub Connection. Create a connection using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Access Type | Required | This connection type is Read-Only. |
Connection Name | Required | Input your desired connection name. |
Project | Required | Name of the main Google Cloud project. |
Requires Credentials | Optional | Check this box to create a new credential or select an existing credential. |
Credential Properties
The following table describes the fields available when creating a new Google Cloud Pub/Sub credential.
Field | Required | Description |
---|---|---|
Credential Name | Required | The name to identify this credential with. This credential will be available as a selection for future use. |
Credential Type | Required | This field will automatically populate with Google Cloud Pub/Sub . |
Google Cloud Credentials | Required | Private key used to identify the Google Cloud Storage account. When a Service Account is created a private key is produced to provide authentication between Google Cloud and third party platforms. Enter the private key for the selected service account. The private key will be a block of JSON, provide the entire key which includes the type , project_id , private_key_id , private_key , client_email , client_id , auth_uri , token_uri , auth_provider_x509_cert_url and client_x509_cert_url .Keep in mind that Google Service Accounts are bound to a project. Each Google project will require a new credential object. |
Read Connector Properties
The following table describes the fields available when creating a new Google Cloud Pub/Sub Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
Field | Required | Description |
---|---|---|
Name | Required | Provide a name for your connector. We recommend using lowercase with underscores in place of spaces. |
Description | Optional | Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons. |
Project Override | Optional | Use this field to connect to topics in a different project that your credentials have access to. If left blank, the connector will only connect to topics within the GCP project entered when creating the Pub/Sub connection. |
Topic | Required | The topic you want to subscribe to. This can either be supplied manually or selected in Step 2 of Create a Read Connector. |
Number of Sub-Partitions | Optional | This value represents the number of Spark partitions used to read in messages in parallel (this is not the same as Ascend Partitions). |
Approx. Message Count Per Sub-Partition | Required | Specify the number of messages to read in with each Spark sub-partition. |
Subscription Expire Days | Optional | If not set, a default policy with ttl of 31 days will be used. The minimum allowed value is 1 day, max allowed is 365 days |
Managing throughput with Pub/Sub Read Connectors
The two fields Number of Sub-Partitions and Approx. Message Count Per Sub-Partition manages the total number of messages read in on each refresh.
For example, imagine you have 1,000,000 messages to read per refresh. Configure your connector with 10 sub-partitions, each designed to read 100,000 messages. Ascend will generate 10 Spark partitions, with each partition reading messages until reaching 100,000 or exhausting available messages in the topic. By using multiple sub-partitions, Spark can allocate tasks across multiple executors, thus enhancing reading speed.
How the Read Connector Works
Subscriptions and Snapshots help read connectors work properly in your Google Cloud Project.
Subscriptions
For each read connector you create, you will notice a subscription created in your GCP project with the name ascend_subscription_<UUID>
. This is the subscription your connector will use to read in messages.
Snapshots
At each refresh, the connector creates a snapshot of the subscription before reading in any messages. Once the messages have been read in and successfully persisted in Ascend, the snapshot is deleted. This ensures that, if there are any read failures, the subsequent run or retry can seek to that snapshot and read in the messages that were missed. To learn more about snapshots with Pub/Sub, click here.
Updated 9 months ago