Grant service account roles for Managed Service for Apache Spark

This page describes how to grant the Service Account User role on Managed Service for Apache Spark Service Account to Cloud Data Fusion Service Agent to allow it to provision and run pipelines on Managed Service for Apache Spark clusters.

For service accounts that are used by Managed Service for Apache Spark, you also need to grant datafusion.instances.runtime permission to access Cloud Data Fusion runtime resources.

Whether you use a user-managed service account, or the default Compute Engine service account on the virtual machines in a cluster, you must grant the Service Account User role to Cloud Data Fusion. Otherwise, Cloud Data Fusion cannot provision a Managed Service for Apache Spark cluster and the following error appears when you execute a data pipeline:

PROVISION task failed in REQUESTING_CREATE state for program run [pipeline-name] due to Managed Service for Apache Spark operation failure: INVALID_ARGUMENT: User not authorized to act as service account '[service-account-name]'

Get the service account name

  1. In the Google Cloud console, go to the Identity and Access Management page.
    Go to the IAM page
  2. From the project selector at the top of the page, choose the project, folder, or organization to which the Cloud Data Fusion instance belongs.
  3. Find and copy the Cloud Data Fusion service account name. Use the following format: service-[project-number]@gcp-sa-datafusion.iam.gserviceaccount.com.

Give service account user permission

  1. In the Google Cloud console, go to the Service Accounts page.
    Go to the Service Accounts page
  2. Click Select a project, choose a project where the service account you want to use for the Managed Service for Apache Spark cluster is located, and then click Open.
  3. Click the email address of the Managed Service for Apache Spark service account.

  4. Click the Principals with access tab. The page displays a list of principals that have been granted roles on the service account.

  5. Click Grant access.

  6. In the New principals field, paste the Cloud Data Fusion service account name that you previously copied.

  7. Select the Service Account User role.

    Service account user

  8. Click Save.

Grant roles to Managed Service for Apache Spark service accounts

Grant runner role permission

Grant the Cloud Data Fusion runner role (roles/datafusion.runner) to service accounts that are used by Managed Service for Apache Spark. This authorizes the Managed Service for Apache Spark service account to run Cloud Data Fusion pipelines in your project. For more information, see Requiring permission to attach service accounts to resources.

Grant Cloud Storage admin permission

In Cloud Data Fusion versions 6.2.0 and above, grant the Cloud Storage admin role (roles/storage.admin) to service accounts that are used by Managed Service for Apache Spark in your project.

What's next