"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Create node pools

When you create or update a Managed Service for Apache Spark on GKE virtual cluster, you specify one or more node pools that the virtual cluster will use to run jobs (this cluster is referred to as the cluster "used by" or "associated" with the specified node pools). If a specified node pool does not exist on your GKE cluster, Managed Service for Apache Spark on GKE will create the node pool on the GKE cluster with settings you specify. If the node pool exists and was created by Managed Service for Apache Spark, it will be validated to confirm that its settings match the specified settings.

Managed Service for Apache Spark on GKE node pool settings

You can specify the following settings on node pools used by your Managed Service for Apache Spark on GKE virtual clusters (these settings are a subset of GKE node pool settings):

accelerators
acceleratorCount
acceleratorType
gpuPartitionSize*
localSsdCount
machineType
minCpuPlatform
minNodeCount
maxNodeCount
preemptible
spot*

Notes:

gpuPartitionSize can be set in the Managed Service for Apache Spark API GkeNodePoolAcceleratorConfig.
spot can be set in the Managed Service for Apache Spark API GkeNodeConfig.

Node pool deletion

When a Managed Service for Apache Spark on GKE cluster is deleted, the node pools used by the cluster are not deleted. See Delete a node pool to delete node pools no longer in use by Managed Service for Apache Spark on GKE clusters.

Node pool location

You can specify the zone location of node pools associated with your Managed Service for Apache Spark on GKE virtual cluster when you create or update the virtual cluster. The node pool zones must be located in the region of the associated virtual cluster.

Role to node pool mapping

Node pool roles are defined for Spark driver and executor work, with a default role defined for all types of work by a node pool. Managed Service for Apache Spark on GKE clusters must have at least one a node pool that is assigned the default role. Assigning other roles is optional.

Recommendation: Create separate node pools for each role type, with node type and size based on role requirements.

gcloud CLI virtual cluster creation example:

gcloud dataproc clusters gke create "${DP_CLUSTER}" \
  --region=${REGION} \
  --gke-cluster=${GKE_CLUSTER} \
  --spark-engine-version=latest \
  --staging-bucket=${BUCKET} \
  --pools="name=${DP_POOLNAME},roles=default \
  --setup-workload-identity
  --pools="name=${DP_CTRL_POOLNAME},roles=default,machineType=e2-standard-4" \
  --pools="name=${DP_DRIVER_POOLNAME},min=1,max=3,roles=spark-driver,machineType=n2-standard-4" \
  --pools="name=${DP_EXEC_POOLNAME},min=1,max=10,roles=spark-executor,machineType=n2-standard-8"

Create node pools Stay organized with collections Save and categorize content based on your preferences.

Managed Service for Apache Spark on GKE node pool settings

Node pool deletion

Node pool location

Role to node pool mapping

Create node pools