Google Cloud Pub/Sub

Learn the required and optional properties of creating a Google Cloud Pub/Sub Connection, Credential, Read Connector, and Write Connector.

google cloud pubsub

Prerequisites

  • Access credentials with Pub/Sub Editor Role
  • Google Cloud Project
  • Google Cloud Pub/Sub Topic to subscribe to

Connection Properties

The following table describes the fields available when creating a new Google Cloud Pub/Sub Connection. Create a connection using the information below and these step-by-step instructions.

FieldRequiredDescription
Access TypeRequiredThis connection type is Read-Only.
Connection NameRequiredInput your desired connection name.
ProjectRequiredName of the main Google Cloud project.
Requires CredentialsOptionalCheck this box to create a new credential or select an existing credential.

Credential Properties

The following table describes the fields available when creating a new Google Cloud Pub/Sub credential.

FieldRequiredDescription
Credential NameRequiredThe name to identify this credential with. This credential will be available as a selection for future use.
Credential TypeRequiredThis field will automatically populate with Google Cloud Pub/Sub.
Google Cloud CredentialsRequiredPrivate key used to identify the Google Cloud Storage account. When a Service Account is created a private key is produced to provide authentication between Google Cloud and third party platforms.

Enter the private key for the selected service account. The private key will be a block of JSON, provide the entire key which includes the type, project_id, private_key_id, private_key, client_email, client_id, auth_uri, token_uri, auth_provider_x509_cert_url and client_x509_cert_url.

Keep in mind that Google Service Accounts are bound to a project. Each Google project will require a new credential object.

Read Connector Properties

The following table describes the fields available when creating a new Google Cloud Pub/Sub Read Connector. Create a new Read Connector using the information below and these step-by-step instructions.

FieldRequiredDescription
NameRequiredProvide a name for your connector. We recommend using lowercase with underscores in place of spaces.
DescriptionOptionalDescribes the connector. We recommend providing a description if you are ingesting information from the same source multiple times for different reasons.
Project OverrideOptionalUse this field to connect to topics in a different project that your credentials have access to. If left blank, the connector will only connect to topics within the GCP project entered when creating the Pub/Sub connection.
TopicRequiredThe topic you want to subscribe to. This can either be supplied manually or selected in Step 2 of Create a Read Connector.
Number of Sub-PartitionsOptionalThis value represents the number of Spark partitions used to read in messages in parallel (this is not the same as Ascend Partitions).
Approx. Message Count Per Sub-PartitionRequiredSpecify the number of messages to read in with each Spark sub-partition.
Subscription Expire DaysOptionalIf not set, a default policy with ttl of 31 days will be used. The minimum allowed value is 1 day, max allowed is 365 days

Managing throughput with Pub/Sub Read Connectors

The two fields Number of Sub-Partitions and Approx. Message Count Per Sub-Partition manages the total number of messages read in on each refresh.

For example, imagine you have 1,000,000 messages to read per refresh. Configure your connector with 10 sub-partitions, each designed to read 100,000 messages. Ascend will generate 10 Spark partitions, with each partition reading messages until reaching 100,000 or exhausting available messages in the topic. By using multiple sub-partitions, Spark can allocate tasks across multiple executors, thus enhancing reading speed.

How the Read Connector Works

Subscriptions and Snapshots help read connectors work properly in your Google Cloud Project.

Subscriptions

For each read connector you create, you will notice a subscription created in your GCP project with the name ascend_subscription_<UUID>. This is the subscription your connector will use to read in messages.

Snapshots

At each refresh, the connector creates a snapshot of the subscription before reading in any messages. Once the messages have been read in and successfully persisted in Ascend, the snapshot is deleted. This ensures that, if there are any read failures, the subsequent run or retry can seek to that snapshot and read in the messages that were missed. To learn more about snapshots with Pub/Sub, click here.