Iceberg Positional Delete Files
Positional delete files are one of the two types of delete files introduced in Apache Iceberg Spec v2. They enable row-level deletion in Merge-on-Read mode by recording the exact file path and row position (row index within the file) of every deleted row, allowing the query engine to skip those specific rows during reads without rewriting the original data files.
Structure of a Positional Delete File
A positional delete file is an Avro file with exactly two columns:
| Column | Type | Description |
|---|---|---|
file_path | string | The full URI of the data file containing the deleted row |
pos | long | The 0-based position (row index) of the deleted row within the data file |
Example content:
file_path pos
s3://bucket/data/orders/part-00001.parquet 42
s3://bucket/data/orders/part-00001.parquet 10019
s3://bucket/data/orders/part-00002.parquet 7
s3://bucket/data/orders/part-00003.parquet 15887
These entries mean: “When reading part-00001.parquet, skip row 42 and row 10019. When reading part-00002.parquet, skip row 7.”
How Query Engines Apply Positional Deletes
When reading an Iceberg table with pending positional delete files:
- The engine identifies which positional delete files apply to each data file being scanned (via the delete file’s partition bounds in the manifest).
- For each data file, the engine loads the corresponding positional delete entries.
- As the engine scans each row group within the data file, it skips rows whose positions match a delete entry.
- The deleted rows are never returned to the query result.
The skip operation happens at the row level — not at the row group or file level — so the engine must still open and scan the data file, it just omits specific rows.
Positional Deletes vs. Equality Deletes
| Aspect | Positional Deletes | Equality Deletes |
|---|---|---|
| What is recorded | File path + row position | Column values |
| How applied | Skip specific positions | Filter rows by value match |
| Requires knowing row position | Yes | No |
| Efficiency | Very efficient | Less efficient (join-like scan) |
| Best generated by | Streaming CDC engines (Flink) | Batch DML (Spark UPDATE/DELETE) |
| Use case | High-throughput streaming deletes | Business-logic deletes by ID |
When Positional Deletes Are Generated
Positional deletes are generated by write engines that process data file contents and know exact row positions:
-
Apache Flink (primary user): Flink’s Iceberg sink generates positional delete files for upsert operations in CDC pipelines. When Flink processes a CDC UPDATE, it generates a positional delete for the old row (at the known position in the data file) and appends the new row.
-
Apache Spark (CoW-default, MoR possible): Spark can generate positional delete files when configured with
write.delete.mode=merge-on-read.
Positional Delete File Scope
Positional delete files are scoped to specific data files via partition bounds stored in their manifest entries. This allows query planning to skip positional delete files that don’t apply to the data files being scanned:
- A positional delete file for
part-00001.parquetis only loaded whenpart-00001.parquetis being scanned. - Queries scanning only
part-00002.parquetnever load the delete file forpart-00001.parquet.
This scoping is what prevents positional delete files from becoming a global performance bottleneck.
Compaction: Applying Positional Deletes
Positional delete files accumulate over time and must be applied via compaction to restore full read performance:
-- Spark: rewrite data files to apply positional deletes
CALL system.rewrite_data_files(
table => 'db.orders',
strategy => 'sort' -- applies all pending delete files during rewrite
);
After compaction, the rewritten data files contain no deleted rows, and the positional delete files are removed from the table’s manifest — reducing read overhead to zero.