Skip to content
Engines & Integrations Last updated: May 14, 2026

Hive and Apache Iceberg

Apache Hive 4.x has native Iceberg support, enabling Hive SQL to read and write Iceberg tables as first-class objects while the Hive Metastore continues to serve as the Iceberg catalog for organizations with existing HMS infrastructure.

hive icebergapache hive apache iceberghive 4 iceberg tableshiveserver2 iceberghive metastore iceberg

Apache Hive and Apache Iceberg

Apache Hive — historically the dominant SQL query engine for Hadoop-based data lakes — has evolved to support Apache Iceberg as a first-class table format. Hive 4.x and later treat Iceberg tables as native objects, enabling full Hive SQL DML operations against Iceberg tables while maintaining backward compatibility with legacy Hive SerDe tables.

For organizations with large Hive investments (Hive warehouses, Hive SQL workflows, HiveServer2 infrastructure), native Iceberg support provides an upgrade path to modern lakehouse capabilities without immediately replacing all Hive infrastructure.

Hive 4.x Iceberg Support

Hive 4.0 (released 2023) introduced comprehensive Iceberg support:

Creating Iceberg Tables in Hive

-- Create an Iceberg table via Hive SQL
CREATE TABLE orders (
    order_id     BIGINT,
    customer_id  BIGINT,
    total        DOUBLE,
    order_date   DATE,
    status       STRING
)
STORED BY ICEBERG    -- Iceberg storage handler
STORED AS PARQUET
LOCATION 'hdfs:///warehouse/orders/'
TBLPROPERTIES (
    'format-version' = '2',    -- always use v2
    'partitioning' = 'months(order_date)'
);

Full DML in Hive

-- INSERT
INSERT INTO orders VALUES (1001, 42, 150.00, '2026-05-14', 'pending');

-- UPDATE (row-level, MoR)
UPDATE orders SET status = 'shipped' WHERE order_id = 1001;

-- DELETE
DELETE FROM orders WHERE order_date < '2020-01-01';

-- MERGE INTO (upsert)
MERGE INTO orders AS target
USING updates AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN UPDATE SET status = source.status, total = source.total
WHEN NOT MATCHED THEN INSERT VALUES (
    source.order_id, source.customer_id, source.total,
    source.order_date, source.status
);

Time Travel in Hive

-- Query as of a specific snapshot ID
SELECT * FROM orders FOR SYSTEM_VERSION AS OF 8027658604211071520;

-- Query as of a timestamp
SELECT * FROM orders FOR SYSTEM_TIME AS OF '2026-05-14 10:00:00';

-- View snapshot history
SELECT * FROM orders$snapshots;

Hive Metastore as Iceberg Catalog

The Hive Metastore (HMS) serves as the Iceberg catalog for Hive-managed Iceberg tables. HMS stores the mapping from table name to current Iceberg metadata file location, updated on each commit.

This is the same HMS used by Spark, Flink, and Trino for Iceberg tables when configured with the hive catalog type — enabling multi-engine access to Hive-registered Iceberg tables.

Hive + Iceberg in the Migration Context

For Hive shop migrations to a modern lakehouse:

Step 1: Migrate existing Hive tables to Iceberg format (in-place, no data copy). Step 2: Continue using Hive SQL for existing workflows — they now run against Iceberg. Step 3: Gradually add Spark, Dremio, or Trino for workloads where Hive is slow. Step 4: Optionally migrate the catalog from HMS to Apache Polaris when ready.

This incremental approach allows organizations to adopt Iceberg’s benefits without a “rip and replace” migration.

Hive vs. Modern Engines for Iceberg

AspectHive 4.xSparkDremio
Iceberg supportFull (v4.x+)FullFull
SQL compatibilityHiveQLSpark SQLANSI SQL / DremioSQL
Interactive queriesSlow (batch-oriented)ModerateExcellent (sub-second)
AI integrationNoNoYes (AI Semantic Layer)
Best forLegacy migration, batch ETLLarge-scale ETL/MLAnalytics, AI, BI

Hive’s role in the modern lakehouse is as a migration bridge — enabling organizations to adopt Iceberg incrementally without abandoning their Hive investments, while transitioning analytics workloads to faster, more capable engines over time.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base