The Lakehouse runtime catalog is a serverless, unified metastore for Lakehouse for Apache Iceberg that simplifies managing self-hosted Hive Metastores. This single, fully managed metadata layer eliminates the need for separate metadata stores for open-source workloads. It lets you seamlessly share data across Apache Spark, Apache Hive, and BigQuery.
Optimized for Apache Spark ExternalCatalog compatibility, this integration
supports a subset of the Hive Metastore interface. To see if your workloads
depend on unsupported features like transactions, compactions, or Kerberos,
review the feature comparison and limitations.
How Hive integrates with the Lakehouse runtime catalog
Managed Service for Apache Spark images are preconfigured with the necessary custom
IMetastoreClient and other required dependencies to simplify using the
Lakehouse runtime catalog with your Spark jobs. The following sequence
describes how Spark connects to Lakehouse runtime catalog.
- Apache Spark connects to external metadata catalogs by using the Apache Hive
IMetastoreClientinterface. - The Lakehouse runtime catalog uses a custom
IMetastoreClientto provide a managed metastore service for Spark and Hive metadata. - Managed Service for Apache Spark images include the required client and dependencies to integrate with the Lakehouse runtime catalog.
After setup, you can query a subset of tables created from Spark in BigQuery. It supports specific data type mappings between Spark and BigQuery, and various storage formats, such as Parquet, ORC, and Avro.
Feature comparison with Hive Metastore
The following table compares entities and operations in Hive Metastore and Lakehouse.
| Entity or operation | Hive Metastore | Lakehouse runtime catalog |
|---|---|---|
| Catalog | ✅ | ✅ |
| Database (create, delete, update) | ✅ | ✅ |
| Table (create, delete, update) | ✅ | ✅ |
| Partition (add, drop, update) | ✅ | ✅ |
| User-defined functions | ✅ | ❌ |
| Bucketing columns | ✅ | ❌ |
| Skewed columns | ✅ | ❌ |
| Table column stats | ✅ | ❌ |
| Partition column stats | ✅ | ❌ |
| Key constraints | ✅ | ❌ |
| Primary keys | ✅ | ❌ |
| Master keys | ✅ | ❌ |
| Delegation tokens (Kerberos) | ✅ | ❌ |
| Workload manager resource plans | ✅ | ❌ |
| Transactions and compactions | ✅ | ❌ |
| Table privileges | ✅ | ✅ (through Identity and Access Management (IAM)) |
| Column privileges | ✅ | ❌ |
| Partition privileges | ✅ | ✅ (through IAM) |
| Roles | ✅ | ❌ |