Databricks Data Plane

Overview

🚧

Submit an environment request before continuing

Before creating a Databricks data plane, submit an environment request indicating which data plane you would like enabled. Chat with us through Intercom within Ascend, or file a support ticket by emailing [email protected]. Once your data plane is enabled, you'll receive confirmation your data plane is ready for set up.

Enterprise Security Users:
For AWS S3, please send us a VPC endpoint within your environment request.
For Azure, please include relevant subnet IDs.

Setting up a Databricks data plane is a four-step process. Within Databricks, first, create an all-purpose compute cluster. Second, create a SQL warehouse. Finally generate an access token. Once set up is complete within Databricks, create a new data service within Ascend to run on Databricks.

You’ll need advanced permissions in Databricks Data Science & Engineering and SQL warehouse. For more information on these permissions, see the Databricks requirements for SQL warehouses.

You'll also want a text editor open to temporarily record credentials, endpoints, and server host names when setting up Databricks. This information will be used to configure Ascend to run within Databricks.

Step 1: Create a Databricks all-purpose compute cluster

When creating an all-purpose compute cluster, follow the steps for using the Create button.

Use the following configuration settings:

ParameterValueRequired
Multi-nodeRequired
Access modeNo isolation sharedRequired
Databricks runtime version10.4 LTS (Scala 2.12, Spark 3.2.1)Recommended
Worker typei3.XlargeRecommended
Driver tyleSame as workerRecommended
Enable Auto ScalingTrueRequired
Enable Auto Scaling local storageFalseRecommended
Terminate after __ minutes of inactivity30Recommended

Under Advanced options, click Spark and copy/paste the following into the Spark config:

spark.databricks.delta.schema.autoMerge.enabled true

📘

Record the following for later use:

Once the cluster is created, select the compute cluster name. Under the Configuration tab, select Advanced Options. Select the JDBC/ODBC tab. Record the HTTP Path endpoint.

Step 2: Create a SQL warehouse

When creating an SQL warehouse, follow the steps provided by Databricks. Utilize your best practices standards for Name, Cluster size, and Scaling. We recommend setting Auto Stop to “30 minutes.” Under Advanced options, we recommend setting Spot instance policy to “Cost optimized”.

📘

Record the following for later use:

After the SQL warehouse is created, record the following information from Connection details for use within Ascend (see Figure 3):

  • Server host name
  • HTTP path endpoint

Step 3: Generate an access token for Ascend

Next, generate a personal access token. When setting the Lifetime, please remember this token will need to be renewed.

📘

Record the following for later use:

After the personal access token is generated, record immediately.

Step 4: Create a Databricks data service in Ascend

From your Ascend dashboard, select +New Data Service.
Enter a name and description, and select CREATE.

Data service settings

After creating the data service, select it from dashboard.
Next, select Connections. Use the following settings:

ParameterValue
Connection NameDBX_NAME
Connection TypeDatabricks
Access TypeRead-Write
Server Host NamePaste recorded Databricks SQL Warehouse server host name
Execution Context for SQL WorkUse a Databricks SQL Endpoint
Endpoint IDPaste recorded Databricks SQL Warehouse HTTP path endpoint
Execution Context for Non-SQL Work (e.g. PySpark transforms)Use an all-purpose Databricks Cluster
Cluster IDPaste recorded Databricks cluster JDBC/ODBC endpoint

Check Required Credentials.

  • Select Create a new credential from the drop-down menu.
  • Assign a credential name.
  • Paste the recorded Access Token from Step 3 and select CREATE.

Data plane configuration

  • In the left menu, select Data Plane Configuration.
  • Select the newly create Databricks Connection from the drop-down menu.
  • Select Update.

Creating Dataflows and Connectors

Once the data plane is configured, you can begin creating data flows. Create Read Connectors for Databricks utilizing the previously configured connection.