Databricks Data Plane

Overview

๐Ÿšง

Submit an environment request before continuing

Before creating a Databricks data plane, submit an environment request indicating which data plane you would like enabled. Chat with us through Intercom within Ascend, or file a support ticket by emailing suppo[email protected]. Once your data plane is enabled, you'll receive confirmation your data plane is ready for set up.

Enterprise Security Users:
For AWS S3, please send us a VPC endpoint within your environment request.
For Azure, please include relevant subnet IDs.

Setting up a Databricks data plane is a four-step process. Within Databricks, first, create an all-purpose compute cluster. Second, create a SQL warehouse. Finally generate an access token. Once set up is complete within Databricks, create a new data service within Ascend to run on Databricks.

Youโ€™ll need advanced permissions in Databricks Data Science & Engineering and SQL warehouse. For more information on these permissions, see the Databricks requirements for SQL warehouses.

You'll also want a text editor open to temporarily record credentials, endpoints, and server host names when setting up Databricks. This information will be used to configure Ascend to run within Databricks.

Step 1: Create a Databricks all-purpose compute cluster

When creating an all-purpose compute cluster, follow the steps for using the Create button.

Use the following configuration settings:

ParameterValueRequired
Multi-nodeRequired
Access modeNo isolation sharedRequired
Databricks runtime version10.4 LTS (Scala 2.12, Spark 3.2.1)Recommended
Worker typei3.XlargeRecommended
Driver tyleSame as workerRecommended
Enable Auto ScalingTrueRequired
Enable Auto Scaling local storageFalseRecommended
Terminate after __ minutes of inactivity30Recommended

Under Advanced options, click Spark and copy/paste the following into the Spark config:

spark.databricks.delta.schema.autoMerge.enabled true

๐Ÿ“˜

Record the following for later use:

Once the cluster is created, select the compute cluster name. Under the Configuration tab, select Advanced Options. Select the JDBC/ODBC tab. Record the HTTP Path endpoint.

Step 2: Create a SQL warehouse

When creating an SQL warehouse, follow the steps provided by Databricks. Utilize your best practices standards for Name, Cluster size, and Scaling. We recommend setting Auto Stop to โ€œ30 minutes.โ€ Under Advanced options, we recommend setting Spot instance policy to โ€œCost optimizedโ€.

๐Ÿ“˜

Record the following for later use:

After the SQL warehouse is created, record the following information from Connection details for use within Ascend (see Figure 3):

  • Server host name
  • HTTP path endpoint

Step 3: Generate an access token for Ascend

Next, generate a personal access token. When setting the Lifetime, please remember this token will need to be renewed.

๐Ÿ“˜

Record the following for later use:

After the personal access token is generated, record immediately.

Step 4: Create a Databricks data service in Ascend

From your Ascend dashboard, select +New Data Service.
Enter a name and description, and select CREATE.

Data service settings

After creating the data service, select it from dashboard.
Next, select Connections. Use the following settings:

ParameterValue
Connection NameDBX_NAME
Connection TypeDatabricks
Access TypeRead-Write
Server Host NamePaste recorded Databricks SQL Warehouse server host name
Execution Context for SQL WorkUse a Databricks SQL Endpoint
Endpoint IDPaste recorded Databricks SQL Warehouse HTTP path endpoint
Execution Context for Non-SQL Work (e.g. PySpark transforms)Use an all-purpose Databricks Cluster
Cluster IDPaste recorded Databricks cluster JDBC/ODBC endpoint

Check Required Credentials.

  • Select Create a new credential from the drop-down menu.
  • Assign a credential name.
  • Paste the recorded Access Token from Step 3 and click CREATE.

Data plane configuration

  • In the left menu, select Data Plane Configuration.
  • Select the newly create Databricks Connection from the drop-down menu.
  • Select Update.

Creating Dataflows and Connectors

Once the data plane is configured, you can begin creating data flows. Create Read Connectors for Databricks utilizing the previously configured connection.