Read BigLake tables for Apache Iceberg in BigQuery with Apache Spark

The following sections describe how to read managed tables using BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery) with Apache Spark.

Before you begin

Understand the different types of BigLake tables and the implications of using them, in the BigLake table overview.
Before reading BigLake Iceberg tables in BigQuery with Apache Spark, ensure that you have set up a Cloud resource connection to a storage bucket. Your connection needs write permissions on the storage bucket, as specified in the following Required roles section. For more information about required roles and permissions for connections, see Manage connections.

Required roles

To get the permissions that you need to let BigQuery manage tables in your project, ask your administrator to grant you the following IAM roles:

To query data:
- BigQuery Data Viewer (roles/bigquery.dataViewer) on your project
- BigQuery User (roles/bigquery.user) on your project
Grant the connection service account the following roles so it can read and write data in Cloud Storage:
- Storage Object User (roles/storage.objectUser) on the bucket
- Storage Legacy Bucket Reader (roles/storage.legacyBucketReader) on the bucket

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to let BigQuery manage tables in your project. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to let BigQuery manage tables in your project:

bigquery.connections.delegate on your project
bigquery.jobs.create on your project
bigquery.readsessions.create on your project
bigquery.tables.get on your project
bigquery.tables.getData on your project
storage.buckets.get on your bucket
storage.objects.create on your bucket
storage.objects.delete on your bucket
storage.objects.get on your bucket
storage.objects.list on your bucket

You might also be able to get these permissions with custom roles or other predefined roles.

Read BigLake Iceberg tables in BigQuery with Apache Spark

The following sample sets up your environment to use Spark SQL with Apache Iceberg, and then executes a query to fetch data from a specified BigLake Iceberg table in BigQuery.

spark-sql \
  --packages org.apache.iceberg:iceberg-spark-runtime-ICEBERG_VERSION_NUMBER \
  --conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.CATALOG_NAME.type=hadoop \
  --conf spark.sql.catalog.CATALOG_NAME.warehouse='BUCKET_PATH' \

# Query the table
SELECT * FROM CATALOG_NAME.FOLDER_NAME;

Replace the following:

ICEBERG_VERSION_NUMBER: the current version of Apache Spark Iceberg runtime. Download the latest version from Spark Releases.
CATALOG_NAME: the catalog to reference your BigLake Iceberg table in BigQuery.
BUCKET_PATH: the path to the bucket containing the table files. For example, gs://mybucket/.
FOLDER_NAME: the folder containing the table files. For example, myfolder.

Read BigLake tables for Apache Iceberg in BigQuery with Apache Spark

Before you begin

Required roles

Required permissions

Read BigLake Iceberg tables in BigQuery with Apache Spark

What's next