Create a Spark Cluster Pool

Create, update, and delete a Spark Cluster Pool in Ascend.

🚧

Requires Site Admin permissions to create. Assigning cluster pools to Data Services and Dataflows requires Data Service and Dataflow Access.

Configure a Cluster Pool

  • From the Dashboard, Select Admin > Cluster Pools.
  • Select NEW CLUSTER POOL.
  • Add a Cluster Pool Name.
cluster pool
  • Update each field to your preferred specifications. Refer to the table below for descriptions of each field.
  • Select CREATE.

Cluster Pool Management Properties

FieldDefaultDetails
Min Number of Clusters in Pool1The minimum number of within the pool.
Max Number of Clusters in Pool1The maximum number of within the pool.
Default Cluster PoolNoThe default cluster is labeled as default and cannot be changed from default. Conversely, additional cluster pools cannot be labeled as default.
Driver Size in vCPU15Maximum number is 15.
Executor Size in vCPU15Maximum number is 15.
Min Number of Executors2Default minimum number of executors is 2, default maximum number of executors is 4. For local mode cluster (driver-only), both minimum and maximum number of executors need to be set to 0.
Max Number of Executors4Default minimum number of executors is 2, default maximum number of executors is 4. For local mode cluster (driver-only), both minimum and maximum number of executors need to be set to 0.
Cluster Terminates After5Values in minutes. Default terminate time is 5 minutes
PIP Packages to Install for Each ClusternonePIP packages are dependencies for Python and with Ascend, you're able to load PIP packages as dependencies/libraries directly into the cluster. However, you cannot install PIP packages that are a different version of packages currently used by Ascend. See the list of pre-installed packages.
Ephemeral Volume Size to Mount on Driver8Volume size in GB.
Ephemeral Volume Size to Mount on Executor8Volume size in GB.
Container ImagenativeAscend native container image or "Container image URL" (a custom image)
Spark RuntimeSpark 3.4.0Spark version of native image or base version of custom image

Additional fields for custom image option:

FieldDetails
Image NameImage URL stored in your container registry
Registry CredentialsCreate or select existing credentials that will be used to get access to the image. Credentials should be given in Kubernetes' config.json format.
For more information, see Kubernetes Images JSON Config and Pull an Image from a Private Registry

Registry credentials format example:

{
  "auths": {
    "https://gcr.io/<my-company>/": {
      "username": "\_json_key",
      "password": "{\\n  \"type\": \"service_account\",\\n  \"project_id\": \"<my-company>\",...}\\n",
      "email": "[email protected]",
      "auth": "c3R...zE2"
    },
    "quay.io/<my-company>/": {
      "auth": "c3R...zE2"
    }
  }
}

Update a Cluster Pool

When updating cluster pools, any tasks currently using the cluster pool will finish before any updates take effect. You do not need to pause Dataflows before making updates.

  • Select the name of the Cluster Pool you want to update.
  • Adjust any of the fields.
  • Select UPDATE.

Delete a Cluster Pool

When deleting a cluster pool, any tasks currently using the cluster pool will finish before the deletion takes effect. A cluster referenced in any data service's data plane configuration cannot be deleted. To delete a cluster, first switch it to 'default'.

Alternatively, scale it down to zero by setting 'min_clusters_in_pool' to zero and adjusting 'cluster_terminate_after' to an appropriate value, eliminating the need for deletion.

  • To delete a cluster pool, select the name of the cluster pool and then select DELETE.
delete cluster pool