Apache Iceberg vs. Delta Lake vs. Apache Hudi
The open table format landscape is dominated by three projects: Apache Iceberg, Delta Lake (primarily maintained by Databricks), and Apache Hudi (primarily maintained by Onehouse/Uber). Each emerged from different companies to solve the same core problems with raw data lake storage, but with different design choices, governance models, and trade-offs.
Feature Comparison
| Feature | Apache Iceberg | Delta Lake | Apache Hudi |
|---|---|---|---|
| Governance | Apache Foundation (ASF) | Linux Foundation | Apache Foundation (ASF) |
| Primary backer | Broad ecosystem (Netflix origin) | Databricks | Onehouse (Uber origin) |
| ACID transactions | Yes | Yes | Yes |
| Schema evolution | Yes | Yes | Partial |
| Time travel | Yes | Yes | Yes |
| Hidden partitioning | Yes | No | No |
| Partition evolution | Yes | No | No |
| Row-level deletes | Yes (Spec v2) | Yes | Yes |
| Upserts (MERGE) | Yes | Yes | Yes (native) |
| Streaming write | Yes (Flink/Spark) | Yes (Spark) | Yes |
| Engine neutrality | Excellent | Good (Spark-centric) | Good |
| REST Catalog spec | Yes | No | No |
| Credential vending | Yes (via REST Catalog) | No | No |
| Branching/tagging | Yes (table-level) | No | No |
| Python client | PyIceberg | delta-rs | PyHudi |
| DuckDB support | Yes | Yes | Limited |
Governance: The Critical Difference
The single most important strategic difference between these formats is governance:
- Apache Iceberg is governed by the Apache Software Foundation under an open community model. No single company controls the specification.
- Delta Lake is governed by the Linux Foundation but is primarily developed by Databricks. The practical reality is that Databricks drives most design decisions.
- Apache Hudi is governed by the Apache Software Foundation, with Onehouse (the company founded by Hudi’s creator) as the primary commercial backer.
For organizations concerned about vendor lock-in, Apache Iceberg and Apache Hudi’s ASF governance provides stronger neutrality guarantees than Delta Lake’s de-facto Databricks control.
Ecosystem and Engine Support
Apache Iceberg has the broadest multi-engine support:
- Dremio, Spark, Flink, Trino, Presto, DuckDB, PyIceberg, Hive, AWS Athena, BigQuery, Snowflake, and more.
- The Iceberg REST Catalog standard enables engine-agnostic catalog access — any REST-compliant engine works with any REST-compliant catalog.
- Apache Polaris (co-created by Dremio and Snowflake, donated to Apache) provides a neutral reference catalog implementation.
Delta Lake has excellent Spark support and increasingly good support for other engines via the Delta Kernel and delta-rs, but Databricks remains the primary optimized runtime.
Apache Hudi has strong streaming upsert capabilities (native MOR streaming with Flink) and good Spark support, with growing support for other engines.
Why Apache Iceberg Won (and Is Winning) the Format Wars
By 2024–2026, Apache Iceberg has emerged as the clear ecosystem standard:
- Snowflake adopted Iceberg: Snowflake Open Catalog and native Iceberg table support.
- Apache Polaris: Co-created by Dremio and Snowflake, donated to Apache — the neutral reference catalog.
- AWS native support: Amazon Athena, S3 Tables, AWS Glue all have native Iceberg support.
- Google BigQuery Iceberg: BigQuery can read and write Iceberg tables.
- DuckDB + PyIceberg: The Python and data science ecosystem chose Iceberg.
- The REST Catalog standard: Iceberg’s catalog interoperability story has no equivalent in Delta Lake or Hudi.
While Delta Lake remains popular in Databricks-centric deployments and Hudi has a loyal following in streaming-first architectures, Apache Iceberg has become the format of choice for multi-engine, multi-cloud lakehouses.
Migration Between Formats
Migrating between table formats is possible but non-trivial:
- Hive → Iceberg: Use Spark’s
CALL system.migrate()for in-place migration. - Delta Lake → Iceberg: Convert via Parquet (rewrite as Iceberg from the Parquet files).
- Hudi → Iceberg: Convert via Parquet export + Iceberg import.
For new projects in 2025+, Apache Iceberg is the clear default choice for multi-engine, vendor-neutral lakehouses.