Skip to content
Patterns & Architecture Last updated: May 14, 2026

Medallion Architecture with Apache Iceberg

The Medallion Architecture (Bronze/Silver/Gold) is a multi-layer data organization pattern where raw data flows through progressive refinement stages, with Apache Iceberg providing ACID-safe writes, schema evolution, and time travel at each layer for reliable, governed lakehouse pipelines.

medallion architecture icebergbronze silver gold icebergiceberg data pipelinelakehouse medallioniceberg etl architecture

Medallion Architecture with Apache Iceberg

The Medallion Architecture is a data design pattern that organizes a data lakehouse into three progressive refinement layers — Bronze (raw), Silver (cleaned/normalized), and Gold (business-ready) — with each layer transforming data into increasingly trusted, performant, and use-case-specific tables.

Apache Iceberg is the ideal table format for the Medallion Architecture because its ACID guarantees, schema evolution, and time travel capabilities provide reliability and observability at every layer of the pipeline.

The Three Layers

Bronze Layer: Raw Ingestion

The Bronze layer captures raw, unmodified data exactly as it arrives from source systems. No transformations, no cleaning, no filtering.

Characteristics:

Iceberg properties at Bronze:

-- Bronze table: raw orders
CREATE TABLE bronze.orders (
    raw_payload STRING,  -- raw JSON or entire source row
    ingested_at TIMESTAMP,
    source_system STRING
) USING iceberg
PARTITIONED BY (days(ingested_at));

Silver Layer: Cleaned and Normalized

The Silver layer applies data quality rules, type casting, deduplication, and normalization to produce a clean, queryable representation of each entity.

Characteristics:

Iceberg properties at Silver:

-- Silver table: cleaned orders (CDC upsert target)
CREATE TABLE silver.orders (
    order_id     BIGINT NOT NULL,
    customer_id  BIGINT,
    order_date   DATE,
    total        DECIMAL(10,2),
    status       STRING,
    region       STRING,
    updated_at   TIMESTAMP
) USING iceberg
PARTITIONED BY (months(order_date));
-- Periodic Bronze → Silver CDC apply
MERGE INTO silver.orders AS target
USING bronze_normalized AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT VALUES (...);

Gold Layer: Business-Ready Aggregates

The Gold layer contains pre-aggregated, business-metric-aligned tables optimized for specific analytical use cases: executive dashboards, BI tools, ML feature stores, and AI agent queries.

Characteristics:

Iceberg properties at Gold:

-- Gold table: daily revenue by region and product category
CREATE TABLE gold.daily_revenue (
    revenue_date     DATE,
    region           STRING,
    product_category STRING,
    total_revenue    DECIMAL(18,2),
    order_count      BIGINT,
    avg_order_value  DECIMAL(10,2)
) USING iceberg
PARTITIONED BY (months(revenue_date))
TBLPROPERTIES ('write.merge.mode' = 'copy-on-write');

Full Pipeline with Iceberg

Source Systems (DBs, APIs, Kafka)

  ▼ (Flink streaming, batch ingestion)
Bronze Layer (raw Iceberg tables, MoR, append-only)

  ▼ (Spark batch CDC apply, MERGE INTO)
Silver Layer (cleaned Iceberg tables, schema-enforced)

  ▼ (Spark aggregation, Dremio virtual datasets)
Gold Layer (business-ready Iceberg tables, CoW, clustered)


Dremio Intelligent Query Engine → BI Tools, AI Agents, Self-serve analytics

Why Iceberg Enables the Medallion Pattern

RequirementHow Iceberg Fulfills It
Raw data preservationImmutable snapshots at Bronze → replay any point in history
CDC application at SilverMERGE INTO with MoR → fast upsert without rewrites
Schema evolutionSafe schema changes at each layer without pipeline breakage
Audit trailSnapshot history at every layer
Performance at GoldCoW + compaction + Z-order clustering + Dremio Reflections
Cross-layer consistencyAtomic snapshot commits → each layer always consistent
AI/agent readinessSemantic layer over Gold tables → AI agents understand data

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base