- Access credentials with Pub/Sub Editor Role
- Google Cloud Project
- Google Cloud Pub/Sub Topic to subscribe to
The following table describes the fields available when creating a new Google Cloud Pub/Sub Connection. Create a connection using the information below and these step-by-step instructions.
|Access Type||Required||This connection type is Read-Only.|
|Connection Name||Required||Input your desired connection name.|
|Project||Required||Name of the main Google Cloud project.|
|Requires Credentials||Optional||Check this box to create a new credential or select an existing credential.|
The following table describes the fields available when creating a new Google Cloud Pub/Sub credential.
|Credential Name||Required||The name to identify this credential with. This credential will be available as a selection for future use.|
|Credential Type||Required||This field will automatically populate with |
|Google Cloud Credentials||Required||Private key used to identify the Google Cloud Storage account. When a Service Account is created a private key is produced to provide authentication between Google Cloud and third party platforms.|
Enter the private key for the selected service account. The private key will be a block of JSON, provide the entire key which includes the
Keep in mind that Google Service Accounts are bound to a project. Each Google project will require a new credential object.
The following table describes the fields available when creating a new Google Cloud Pub/Sub Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.
|Name||Required||Provide a name for your connector. We recommend using lowercase with underscores in place of spaces.|
|Description||Optional||Describes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons.|
|Project Override||Optional||Use this field to connect to topics in a different project that your credentials have access to. If left blank, the connector will only connect to topics within the GCP project entered when creating the Pub/Sub connection.|
|Topic||Required||The topic you want to subscribe to. This can either be supplied manually or selected in Step 2 of Create a Read Connector.|
|Number of Sub-Partitions||Optional||This value represents the number of Spark partitions used to read in messages in parallel (this is not the same as Ascend Partitions).|
|Approx. Message Count Per Sub-Partition||Required||Specify the number of messages to read in with each Spark sub-partition.|
|Subscription Expire Days||Optional||If not set, a default policy with ttl of 31 days will be used. The minimum allowed value is 1 day, max allowed is 365 days|
The two fields Number of Sub-Partitions and Approx. Message Count Per Sub-Partition manages the total number of messages read in on each refresh.
For example, imagine you have 1,000,000 messages to read per refresh. Configure your connector with 10 sub-partitions, each designed to read 100,000 messages. Ascend will generate 10 Spark partitions, with each partition reading messages until reaching 100,000 or exhausting available messages in the topic. By using multiple sub-partitions, Spark can allocate tasks across multiple executors, thus enhancing reading speed.
Subscriptions and Snapshots help read connectors work properly in your Google Cloud Project.
For each read connector you create, you will notice a subscription created in your GCP project with the name
ascend_subscription_<UUID>. This is the subscription your connector will use to read in messages.
At each refresh, the connector creates a snapshot of the subscription before reading in any messages. Once the messages have been read in and successfully persisted in Ascend, the snapshot is deleted. This ensures that, if there are any read failures, the subsequent run or retry can seek to that snapshot and read in the messages that were missed. To learn more about snapshots with Pub/Sub, click here.
Updated 5 months ago