Create a Spark Cluster Pool
Create, update, and delete a Spark Cluster Pool in Ascend.
Requires Site Admin permissions to create. Assigning cluster pools to Data Services and Dataflows requires Data Service and Dataflow Access.
Configure a Cluster Pool
- From the Dashboard, Select Admin > Cluster Pools.
- Select NEW CLUSTER POOL.
- Add a Cluster Pool Name.
- Update each field to your preferred specifications. Refer to the table below for descriptions of each field.
- Select CREATE.
Cluster Pool Management Properties
Field | Default | Details |
---|---|---|
Min Number of Clusters in Pool | 1 | The minimum number of within the pool. |
Max Number of Clusters in Pool | 1 | The maximum number of within the pool. |
Default Cluster Pool | No | The default cluster is labeled as default and cannot be changed from default. Conversely, additional cluster pools cannot be labeled as default. |
Driver Size in vCPU | 15 | Maximum number is 15. |
Executor Size in vCPU | 15 | Maximum number is 15. |
Min Number of Executors | 2 | Default minimum number of executors is 2, default maximum number of executors is 4. For local mode cluster (driver-only), both minimum and maximum number of executors need to be set to 0. |
Max Number of Executors | 4 | Default minimum number of executors is 2, default maximum number of executors is 4. For local mode cluster (driver-only), both minimum and maximum number of executors need to be set to 0. |
Cluster Terminates After | 5 | Values in minutes. Default terminate time is 5 minutes |
PIP Packages to Install for Each Cluster | none | PIP packages are dependencies for Python and with Ascend, you're able to load PIP packages as dependencies/libraries directly into the cluster. However, you cannot install PIP packages that are a different version of packages currently used by Ascend. See the list of pre-installed packages. |
Ephemeral Volume Size to Mount on Driver | 8 | Volume size in GB. |
Ephemeral Volume Size to Mount on Executor | 8 | Volume size in GB. |
Container Image | native | Ascend native container image or "Container image URL" (a custom image) |
Spark Runtime | Spark 3.4.0 | Spark version of native image or base version of custom image |
Additional fields for custom image option:
Field | Details |
---|---|
Image Name | Image URL stored in your container registry |
Registry Credentials | Create or select existing credentials that will be used to get access to the image. Credentials should be given in Kubernetes' config.json format. For more information, see Kubernetes Images JSON Config and Pull an Image from a Private Registry |
Registry credentials format example:
{
"auths": {
"https://gcr.io/<my-company>/": {
"username": "\_json_key",
"password": "{\\n \"type\": \"service_account\",\\n \"project_id\": \"<my-company>\",...}\\n",
"email": "[email protected]",
"auth": "c3R...zE2"
},
"quay.io/<my-company>/": {
"auth": "c3R...zE2"
}
}
}
Update a Cluster Pool
When updating cluster pools, any tasks currently using the cluster pool will finish before any updates take effect. You do not need to pause Dataflows before making updates.
- Select the name of the Cluster Pool you want to update.
- Adjust any of the fields.
- Select UPDATE.
Delete a Cluster Pool
When deleting a cluster pool, any tasks currently using the cluster pool will finish before the deletion takes effect. A cluster referenced in any data service's data plane configuration cannot be deleted. To delete a cluster, first switch it to 'default'.
Alternatively, scale it down to zero by setting 'min_clusters_in_pool' to zero and adjusting 'cluster_terminate_after' to an appropriate value, eliminating the need for deletion.
- To delete a cluster pool, select the name of the cluster pool and then select DELETE.
Updated 8 months ago