Skip to content
Operations & Optimization Last updated: May 14, 2026

Merge-on-Read (MoR) in Iceberg

Merge-on-Read (MoR) is an Iceberg write strategy where UPDATE and DELETE operations write small delete files instead of rewriting data files, enabling fast writes at the cost of applying accumulated deletes during reads, ideal for high-frequency streaming workloads.

iceberg merge on readmor icebergiceberg streaming deletesiceberg write strategyiceberg delete files reads

Merge-on-Read (MoR) in Apache Iceberg

Merge-on-Read (MoR) is one of two Iceberg write strategies for UPDATE, DELETE, and MERGE INTO operations. Unlike Copy-on-Write, MoR does not rewrite existing data files when rows are deleted or updated. Instead, it writes small delete files that record which rows are deleted, and the actual merging of deletes happens at read time.

How Merge-on-Read Works

Consider a data file part-001.parquet with 1,000,000 rows, and a DELETE statement that removes 100 rows:

MoR Behavior:

  1. The engine identifies which rows match the DELETE predicate.
  2. It writes a small positional delete file (or equality delete file) listing the 100 deleted rows.
  3. The new snapshot references both the original data file AND the new delete file.
  4. part-001.parquet is NOT rewritten — it still contains all 1,000,000 rows.

When a subsequent query reads the table:

  1. The engine reads part-001.parquet.
  2. The engine applies the delete file to filter out the 100 deleted rows.
  3. The query sees 999,900 rows.

When to Use Merge-on-Read

MoR is optimal when:

Types of Delete Files in MoR

Positional Delete Files

Record exact (file_path, row_position) pairs. Used when the engine knows the physical location of deleted rows (e.g., Flink streaming with row tracking).

Equality Delete Files

Record column values identifying deleted rows (e.g., WHERE id = 12345). More general but requires a join-like scan during reads.

See Iceberg Delete Files for the full comparison.

MoR SQL Configuration

CREATE TABLE orders (order_id BIGINT, status STRING)
USING iceberg
TBLPROPERTIES (
  'write.delete.mode' = 'merge-on-read',
  'write.update.mode' = 'merge-on-read',
  'write.merge.mode' = 'merge-on-read'
);

Read Performance Degradation Over Time

The key downside of MoR is that read performance degrades as delete files accumulate. Each new delete file is another layer the engine must apply during reads:

This is why compaction is mandatory for MoR tables used in production. Compaction reads all delete files, applies them to the data, and writes new clean data files — resetting delete file count to zero.

V1 vs. V2 MoR Support

MoR (via delete files) requires Iceberg Spec v2. Spec v1 only supports Copy-on-Write for DML operations. All modern Iceberg engines default to creating v2 tables.

Apache Flink is the engine most naturally suited to MoR, because:

  1. Flink’s streaming CDC pipelines produce exactly the right access pattern for MoR: high-frequency, targeted deletes/updates on specific rows.
  2. Flink knows the exact position of each row it processes, making positional delete files (the most efficient delete type) the natural output.
  3. Flink + Iceberg MoR + periodic Spark compaction is the standard architecture for streaming lakehouses.

MoR in the Context of Dremio

Dremio’s Intelligent Query Engine handles both MoR and CoW tables seamlessly. For tables with pending delete files, Dremio applies deletes efficiently during query execution. Dremio’s OPTIMIZE TABLE command can be used to compact MoR tables into clean CoW-equivalent state, and Dremio Cloud supports automatic background optimization to keep MoR tables performant without manual intervention.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base