Iceberg Table Compaction
Compaction is the most critical ongoing maintenance operation for Apache Iceberg tables. Over time, high-frequency streaming writes and micro-batch jobs create many small data files, while UPDATE and DELETE operations accumulate delete files. Without compaction, query performance degrades and storage costs increase.
Compaction is the process of:
- Rewriting small data files into optimally sized files (typically 128MB–512MB).
- Applying accumulated delete files into the rewritten data files, producing clean files with no pending deletes.
- Rewriting manifests to reduce manifest file count and metadata overhead.
After compaction, the same data exists in fewer, larger, cleaner files — dramatically improving both query performance and metadata efficiency.
Why Compaction is Necessary
The Small File Problem
Every Iceberg write transaction produces at least one new data file. Streaming pipelines writing micro-batches every 30 seconds produce 2,880 files per day per partition. Even a moderately partitioned table accumulates tens of thousands of small files quickly.
Small files harm performance because:
- Each file requires an open/read-footer/close cycle
- Column statistics in small files are less selective for data skipping
- More manifest entries must be evaluated during query planning
- Object storage costs increase (per-request fees, per-object storage minimums)
Delete File Accumulation
UPDATE and DELETE operations (in Merge-on-Read mode) write small delete files that are applied at read time. As delete files accumulate, every read must apply more and more deletes — a join-like operation on top of every scan. Without compaction, reads degrade proportionally with delete file count.
Compaction Operations
Data File Rewriting
The core compaction operation combines multiple small files into larger ones:
-- Spark: rewrite small data files
CALL system.rewrite_data_files(
table => 'db.orders',
strategy => 'sort',
sort_order => 'zorder(customer_id, order_date)',
options => map(
'min-file-size-bytes', '67108864', -- 64MB
'max-file-size-bytes', '536870912', -- 512MB
'target-file-size-bytes', '268435456' -- 256MB target
)
);
Options include:
binpackstrategy: Pack small files into target-sized files without sorting.sortstrategy: Sort data during compaction (linear sort or Z-order) to improve column statistics selectivity.- Partial rewrites: Only rewrite files below a minimum size threshold, leaving already-optimal files untouched.
Manifest Rewriting
Compaction also includes rewriting manifests to reduce manifest count and update partition statistics:
-- Spark: rewrite manifests for reduced metadata overhead
CALL system.rewrite_manifests(
table => 'db.orders',
use_caching => true
);
Delete File Application (Positional Delete Removal)
When compacting with the sort strategy, Iceberg applies all pending positional and equality delete files to the new data files, producing clean output with no pending deletes.
Compaction in Dremio
Dremio provides automatic table optimization as part of its Agentic Lakehouse platform. Dremio’s OPTIMIZE TABLE command handles compaction intelligently:
-- Dremio: optimize an Iceberg table
OPTIMIZE TABLE db.orders;
-- With options
OPTIMIZE TABLE db.orders
REWRITE DATA USING BIN_PACK
(TARGET_FILE_SIZE_MB = 256, MIN_FILE_SIZE_MB = 64, MAX_FILE_SIZE_MB = 512);
Dremio can also be configured for automatic background optimization, eliminating the need for manual maintenance schedules.
Compaction Strategies
| Strategy | Description | When to Use |
|---|---|---|
| Bin-pack | Combine files to hit target size | General purpose, fastest |
| Sort | Sort by columns + combine | When query patterns filter on sortable columns |
| Z-Order | Multi-column sort (Z-curve) | Multi-column filter predicates |
Scheduling Compaction
For production tables, compaction should run on a regular schedule. Common approaches:
- Airflow DAG: Schedule a Spark job that calls
rewrite_data_filesnightly. - Dremio auto-optimization: Enable automatic background compaction in Dremio Cloud/Enterprise.
- Flink pipeline: Run continuous compaction as a Flink streaming job alongside the ingestion pipeline.
The right compaction frequency depends on write volume and query SLAs. Tables with very high write frequency may need hourly compaction; batch-loaded tables may only need daily or weekly compaction.