Manage BigLake tables for Apache Iceberg in BigQuery metadata snapshots
The following sections describe how to manage metadata snapshots for managed tables using BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery).
Before you begin
Understand the different types of BigLake tables and the implications of using them, in the BigLake table overview.
Before managing BigLake Iceberg tables in BigQuery metadata snapshots, ensure that you have set up a Cloud resource connection to a storage bucket. Your connection needs write permissions on the storage bucket, as specified in the following Required roles section. For more information about required roles and permissions for connections, see Manage connections.
Required roles
To get the permissions that you need to let BigQuery manage tables in your project, ask your administrator to grant you the following IAM roles:
-
To query data:
-
BigQuery Data Viewer (
roles/bigquery.dataViewer) on your project -
BigQuery User (
roles/bigquery.user) on your project
-
BigQuery Data Viewer (
-
Grant the connection service account the following roles so it can read and write data in Cloud Storage:
-
Storage Object User (
roles/storage.objectUser) on the bucket -
Storage Legacy Bucket Reader (
roles/storage.legacyBucketReader) on the bucket
-
Storage Object User (
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to let BigQuery manage tables in your project. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to let BigQuery manage tables in your project:
-
bigquery.connections.delegateon your project -
bigquery.jobs.createon your project -
bigquery.readsessions.createon your project -
bigquery.tables.geton your project -
bigquery.tables.getDataon your project -
storage.buckets.geton your bucket -
storage.objects.createon your bucket -
storage.objects.deleteon your bucket -
storage.objects.geton your bucket -
storage.objects.liston your bucket
You might also be able to get these permissions with custom roles or other predefined roles.
Create BigLake Iceberg table in BigQuery metadata snapshots
To create a BigLake Iceberg table in BigQuery metadata snapshot, follow these steps:
Export the metadata into the Iceberg V2 format with the
EXPORT TABLE METADATASQL statement.Optional: Schedule Iceberg metadata snapshot refresh. To refresh an Iceberg metadata snapshot based on a set time interval, use a scheduled query.
Optional: Enable metadata auto-refresh for your project to automatically update your Iceberg table metadata snapshot on each table mutation. To enable metadata auto-refresh, contact bigquery-tables-for-apache-iceberg-help@google.com.
EXPORT METADATAcosts are applied on each refresh operation.
The following example creates a scheduled query named My Scheduled Snapshot Refresh Query
using the DDL statement EXPORT TABLE METADATA FROM mydataset.test. The DDL
statement runs every 24 hours.
bq query \ --use_legacy_sql=false \ --display_name='My Scheduled Snapshot Refresh Query' \ --schedule='every 24 hours' \ 'EXPORT TABLE METADATA FROM mydataset.test'
View BigLake Iceberg table in BigQuery metadata snapshot
After you refresh the BigLake Iceberg table in BigQuery metadata snapshot you can
find the snapshot in the Cloud Storage URI
that the BigLake Iceberg table in BigQuery was originally created in. The /data
folder contains the Parquet file data shards, and the /metadata folder contains
the BigLake Iceberg table in BigQuery metadata snapshot.
SELECT table_name, REGEXP_EXTRACT(ddl, r"storage_uri\s*=\s*\"([^\"]+)\"") AS storage_uri FROM `mydataset`.INFORMATION_SCHEMA.TABLES;
Note that mydataset and table_name are placeholders for your actual dataset and table.