As of April 20th, 2026, BigLake is now called Lakehouse for Apache Iceberg. BigLake metastore is now called the Lakehouse runtime catalog. Lakehouse APIs, client libraries, CLI commands, and IAM names remain unchanged and still reference BigLake.

About cross-cloud Lakehouse

Cross-cloud Lakehouse lets you query data stored in other cloud providers directly from Google Cloud without migrating files or building complex ETL pipelines.

As part of Lakehouse, this capability lets you perform unified analytics and apply AI across your distributed datasets using BigQuery, standalone Apache Spark environments, or Managed Service for Apache Spark.

In addition to analytical queries, you can use your federated data for AI-driven insights and governance:

Conversational Analytics: Build specialized agents grounded in your exact data sources, including cross-cloud tables, to analyze data across clouds from a single conversation.
Knowledge Catalog: Use Knowledge Catalog features for data profiling and insights with federated data sources.

Use cases

Cross-cloud Lakehouse supports several key use cases for accessing data across multiple cloud providers:

Reduced data movement lets you query data stored in other cloud environments directly, simplifying data access and processing.
Unified analytics lets you perform advanced analytics with consistent features and hardware optimization across all your data, regardless of where it resides.
Cross-cloud AI and ML lets you apply AI models, autonomous agents, and machine learning directly to your remote data without migrating it.

How cross-cloud Lakehouse works

Cross-cloud Lakehouse queries remote data using the following process:

Metadata discovery: Google Cloud's Lakehouse connects to remote Apache Iceberg REST catalogs, such as Databricks Unity or AWS Glue. Lakehouse discovers the data without copying any files. Depending on the remote catalog provider, Lakehouse authenticates securely through Secret Manager or OpenID Connect token federation with Google as the identity provider (OIDC token federation).
Secure transport: Choosing to route traffic over a private interconnect (for example, Dedicated CCI or Partner Interconnect) significantly reduces data transfer costs compared to the public internet and makes latency highly predictable.
Optimized execution: As queries read data from remote clouds, Lakehouse temporarily caches those data segments locally within Google Cloud on specialized storage. Subsequent queries use the local cache, which avoids a significant portion of cross-cloud egress charges.

Supported catalogs

Cross-cloud Lakehouse supports querying data from the following remote catalog providers:

Databricks Unity Catalog: Supported on Amazon Web Services (AWS) and Google Cloud.
AWS Glue: Supported on Amazon Web Services (AWS).
Snowflake: Supported on Amazon Web Services (AWS) and Google Cloud.
SAP Business Data Cloud (BDC): Supported using the SAP BDC connector.

Core concepts

This section describes the key components essential to using cross-cloud Lakehouse.

Remote Apache Iceberg REST catalogs

This is the metadata layer. You connect to remote Apache Iceberg REST catalogs. Lakehouse discovers the data without copying any files. Through OIDC token federation or OAuth credentials, Lakehouse authenticates securely without requiring long-lived access keys.

Cross-cloud Lakehouse federated catalogs synchronize metadata from remote Apache Iceberg REST catalogs based on a refresh interval. A catalog's background metadata refresh may take longer the more namespace and table resources there are. If the previous refresh overruns, the current refresh will be skipped, but the next refresh will be scheduled at the following interval.

Transport layer

This is the transport layer. You can configure Lakehouse to query data stored in remote cloud providers over either the public internet or a dedicated private interconnect. This section is not applicable for SAP Business Data Cloud (BDC) connections.

Select the transport method that matches your architectural and security requirements:

Customer-owned (CCI)

You can configure BigQuery to query data stored in Amazon Web Services (AWS) Amazon S3 buckets over a private Cross-Cloud Interconnect using either Dedicated Cross-Cloud Interconnect or Partner Cross-Cloud Interconnect.

Using a private interconnect provides the following benefits:

Enhanced security: Data travels across a private network connection between Google Cloud and AWS, avoiding the public internet.
Reduced costs: Potentially lower egress charges from AWS compared to internet egress, especially when combined with your private interconnect capacity.
Consistent performance: More predictable network latency and bandwidth compared to the public internet.

Architecture overview

To enable private querying, you configure a path from BigQuery to your AWS Amazon S3 bucket through your private interconnect. A key component in the Google Cloud Virtual Private Cloud (VPC) (VPC) is an Internal Load Balancer (ILB). The ILB distributes requests from BigQuery to the private endpoints for Amazon S3 within your AWS VPC, which are provisioned using AWS PrivateLink.

Using an ILB with multiple Elastic Network Interfaces (ENIs) as backends is essential for load balancing, scalability, and high availability. This applies whether you use Dedicated CCI or Partner Interconnect.

The private query workflow follows this process:

BigQuery uses a connection configured with a Service Directory service.
Service Directory resolves the service name to the internal IP address of the Google Cloud ILB.
The ILB receives the requests from BigQuery and distributes them to configured backends.
The ILB backends are Hybrid Connectivity Network Endpoint Groups (NEGs), each pointing to the private IP address of an ENI in your AWS VPC.
Traffic flows from the ILB, through the NEGs, across the private interconnect, to the AWS ENIs.
The AWS ENIs, part of an Amazon S3 VPC Interface Endpoint (AWS PrivateLink), provide private access to the Amazon S3 service.

Public internet (no CCI)

If you do not configure a private interconnect, queries to your remote catalog travel over the public internet by default.

When querying data over the public internet, consider the following implications:

Standard encryption: Data access requests and data transfers are encrypted in transit using standard TLS protocols across the public internet.
Egress costs: Data transfer incurs standard internet egress charges from your remote cloud provider (for example, AWS), which are typically higher than private interconnect egress rates.
Variable latency: Network performance, bandwidth, and latency depend on public internet routing and congestion, resulting in less predictable query execution times compared to a dedicated private interconnect.
Simplified setup: Requires no additional networking infrastructure, VPC peering, or Service Directory configuration in Google Cloud or your remote cloud provider.

Architecture overview

When querying data over the public internet, Lakehouse connects directly to your remote catalog and object storage endpoints without requiring private Google Cloud or remote cloud networking infrastructure.

The public internet query workflow follows this process:

BigQuery initiates a query against a federated table defined in your Lakehouse catalog.
Lakehouse authenticates securely with your remote Apache Iceberg catalog using credentials stored in Secret Manager or OIDC token federation.
Lakehouse retrieves the table metadata and manifest files across the public internet to identify the relevant underlying data files (for example, in AWS Amazon S3).
Data access requests for the underlying objects are sent directly from Google Cloud over the public internet using standard TLS encryption.
The remote storage service verifies the request using temporary, scoped credentials vended by Lakehouse and returns the requested data blocks across the public internet to Google Cloud.

About cross-cloud Lakehouse Stay organized with collections Save and categorize content based on your preferences.

Use cases

How cross-cloud Lakehouse works

Supported catalogs

Core concepts

Remote Apache Iceberg REST catalogs

Transport layer

Customer-owned (CCI)

Architecture overview

Public internet (no CCI)

Architecture overview

What's next

About cross-cloud Lakehouse