Read BigLake tables for Apache Iceberg in BigQuery with Apache Spark

The following sections describe how to read managed tables using BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery) with Apache Spark.

Before you begin

  • Understand the different types of BigLake tables and the implications of using them, in the BigLake table overview.

  • Before reading BigLake Iceberg tables in BigQuery with Apache Spark, ensure that you have set up a Cloud resource connection to a storage bucket. Your connection needs write permissions on the storage bucket, as specified in the following Required roles section. For more information about required roles and permissions for connections, see Manage connections.

Required roles

To get the permissions that you need to let BigQuery manage tables in your project, ask your administrator to grant you the following IAM roles:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to let BigQuery manage tables in your project. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to let BigQuery manage tables in your project:

  • bigquery.connections.delegate on your project
  • bigquery.jobs.create on your project
  • bigquery.readsessions.create on your project
  • bigquery.tables.get on your project
  • bigquery.tables.getData on your project
  • storage.buckets.get on your bucket
  • storage.objects.create on your bucket
  • storage.objects.delete on your bucket
  • storage.objects.get on your bucket
  • storage.objects.list on your bucket

You might also be able to get these permissions with custom roles or other predefined roles.

Read BigLake Iceberg tables in BigQuery with Apache Spark

The following sample sets up your environment to use Spark SQL with Apache Iceberg, and then executes a query to fetch data from a specified BigLake Iceberg table in BigQuery.

spark-sql \
  --packages org.apache.iceberg:iceberg-spark-runtime-ICEBERG_VERSION_NUMBER \
  --conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.CATALOG_NAME.type=hadoop \
  --conf spark.sql.catalog.CATALOG_NAME.warehouse='BUCKET_PATH' \

# Query the table
SELECT * FROM CATALOG_NAME.FOLDER_NAME;

Replace the following:

  • ICEBERG_VERSION_NUMBER: the current version of Apache Spark Iceberg runtime. Download the latest version from Spark Releases.
  • CATALOG_NAME: the catalog to reference your BigLake Iceberg table in BigQuery.
  • BUCKET_PATH: the path to the bucket containing the table files. For example, gs://mybucket/.
  • FOLDER_NAME: the folder containing the table files. For example, myfolder.

What's next