Delta Lake vs Iceberg: Architecture Deep-Dive

After running both Delta Lake and Iceberg in production, here's a detailed comparison based on real workloads processing 50TB+ daily.

The Question

Your lakehouse needs a table format. Delta Lake or Iceberg? The answer isn't straightforward.

Architecture Comparison

Delta Lake

Core Design:

Transaction log stored as JSON files in _delta_log/
ACID via optimistic concurrency
Tightly integrated with Spark
Databricks-first, then open-sourced

Transaction Log:

_delta_log/
  00000000000000000000.json  # Initial commit
  00000000000000000001.json  # Add data
  00000000000000000002.json  # Update
  00000000000000000010.checkpoint.parquet  # Checkpoint every 10

Apache Iceberg

Core Design:

Metadata stored as Avro files
Snapshot isolation with manifest files
Engine-agnostic (Spark, Flink, Presto, Trino)
Netflix open-source project

Metadata Structure:

metadata/
  v1.metadata.json          # Table metadata
  snap-123.avro             # Snapshot manifest
  manifest-list.avro        # Manifest list
  manifest-456.avro         # Data file manifest

Feature Comparison

Feature	Delta Lake	Iceberg	Winner
ACID Transactions	✅ Yes	✅ Yes	Tie
Time Travel	✅ Yes	✅ Yes	Tie
Schema Evolution	✅ Yes	✅ Yes	Tie
Partition Evolution	❌ No	✅ Yes	Iceberg
Hidden Partitioning	❌ No	✅ Yes	Iceberg
Z-Ordering	✅ Yes	❌ No	Delta
Deletion Vectors	✅ Yes	❌ No	Delta
Streaming Writes	✅ Excellent	⚠️ Good	Delta
Multi-Engine	⚠️ Limited	✅ Excellent	Iceberg
Databricks Integration	✅ Native	⚠️ Good	Delta

Performance Benchmarks

1. Write Performance

Test: Write 10M rows (5GB)

Delta Lake:

df.write.format("delta").mode("append").save("s3://bucket/delta-table")
# Time: 45 seconds

Iceberg:

df.writeTo("iceberg.db.table").append()
# Time: 52 seconds

Winner: Delta Lake (15% faster writes)

2. Read Performance

Test: Full table scan (50GB)

Delta Lake:

spark.read.format("delta").load("s3://bucket/delta-table")
# Time: 23 seconds

Iceberg:

spark.read.format("iceberg").load("iceberg.db.table")
# Time: 21 seconds

Winner: Iceberg (9% faster reads)

3. Time Travel Query

Test: Query snapshot from 7 days ago

Delta Lake:

spark.read.format("delta").option("versionAsOf", 150).load("s3://bucket/delta-table")
# Time: 25 seconds

Iceberg:

spark.read.format("iceberg").option("snapshot-id", 123456).load("iceberg.db.table")
# Time: 22 seconds

Winner: Iceberg (12% faster)

4. Metadata Operations

Test: ALTER TABLE ADD COLUMN

Delta Lake:

spark.sql("ALTER TABLE delta_table ADD COLUMNS (new_col STRING)")
# Time: 0.5 seconds

Iceberg:

spark.sql("ALTER TABLE iceberg.db.table ADD COLUMN new_col STRING")
# Time: 0.2 seconds

Winner: Iceberg (60% faster metadata ops)

Real-World Use Cases

Delta Lake Wins

1. Databricks-First Architecture

# Native Databricks features
spark.sql("""
    OPTIMIZE delta_table
    ZORDER BY (date, user_id)
""")
 
# Deletion vectors (v3.0+)
spark.sql("""
    DELETE FROM delta_table 
    WHERE user_id = 'inactive'
""")
# No rewrite needed with deletion vectors

2. Streaming Workloads

# Structured Streaming to Delta
stream = spark.readStream.format("kafka")...
 
stream.writeStream \
    .format("delta") \
    .outputMode("append") \
    .option("checkpointLocation", "s3://checkpoints/") \
    .start("s3://bucket/delta-table")
 
# Excellent streaming performance

3. Unity Catalog Integration

# Seamless with Unity Catalog
spark.sql("CREATE TABLE unity_catalog.schema.table USING delta")
# Automatic governance, lineage, access control

Iceberg Wins

1. Multi-Engine Environments

-- Spark
SELECT * FROM iceberg.db.table;
 
-- Presto
SELECT * FROM iceberg.db.table;
 
-- Flink
SELECT * FROM iceberg.db.table;
 
-- Trino
SELECT * FROM iceberg.db.table;
 
-- All engines see same data, no conversion

2. Partition Evolution

# Start with daily partitions
spark.sql("""
    CREATE TABLE iceberg.events (
        event_time timestamp,
        user_id string,
        event_type string
    ) PARTITIONED BY (days(event_time))
""")
 
# Later, change to hourly WITHOUT rewriting data
spark.sql("""
    ALTER TABLE iceberg.events
    SET PARTITION SPEC (hours(event_time))
""")
# Iceberg handles both partition specs transparently

3. Hidden Partitioning

-- Users write without partition columns
INSERT INTO iceberg.events 
VALUES (
    timestamp '2024-03-15 14:30:00',
    'user123',
    'click'
);
 
-- Iceberg automatically partitions
-- No PARTITION BY in INSERT statements needed

Migration Considerations

Delta → Iceberg

Why migrate:

Need multi-engine support
Want partition evolution
Require hidden partitioning

How to migrate:

# Read Delta
df = spark.read.format("delta").load("s3://bucket/delta-table")
 
# Write to Iceberg
df.writeTo("iceberg.db.table").create()
 
# Validate
delta_count = spark.read.format("delta").load("s3://bucket/delta-table").count()
iceberg_count = spark.table("iceberg.db.table").count()
assert delta_count == iceberg_count

Cost: Full data rewrite required

Iceberg → Delta

Why migrate:

Moving to Databricks
Need Z-ordering
Want deletion vectors
Better streaming performance

How to migrate:

# Read Iceberg
df = spark.table("iceberg.db.table")
 
# Write to Delta
df.write.format("delta").mode("overwrite").save("s3://bucket/delta-table")

Production Patterns

Delta Lake Best Practices

# 1. Enable auto-optimize
spark.sql("""
    ALTER TABLE delta_table
    SET TBLPROPERTIES (
        'delta.autoOptimize.optimizeWrite' = 'true',
        'delta.autoOptimize.autoCompact' = 'true'
    )
""")
 
# 2. Regular OPTIMIZE + ZORDER
spark.sql("OPTIMIZE delta_table ZORDER BY (date, user_id)")
 
# 3. VACUUM old files (after retention)
spark.sql("VACUUM delta_table RETAIN 168 HOURS")  # 7 days
 
# 4. Enable deletion vectors (v3.0+)
spark.conf.set("spark.databricks.delta.properties.defaults.enableDeletionVectors", "true")

Iceberg Best Practices

# 1. Compact metadata regularly
spark.sql("CALL iceberg.system.rewrite_manifests('db.table')")
 
# 2. Expire old snapshots
spark.sql("""
    CALL iceberg.system.expire_snapshots(
        table => 'db.table',
        older_than => timestamp '2024-01-01 00:00:00'
    )
""")
 
# 3. Rewrite small files
spark.sql("""
    CALL iceberg.system.rewrite_data_files(
        table => 'db.table',
        strategy => 'binpack',
        options => map('target-file-size-bytes', '1073741824')
    )
""")
 
# 4. Use hidden partitioning
spark.sql("""
    CREATE TABLE db.events (
        timestamp timestamp,
        data string
    ) PARTITIONED BY (days(timestamp))
""")

Cost Analysis

Storage Costs (50TB table)

Delta Lake:

Data files: $1,150/month
Transaction log: $5/month
Checkpoints: $2/month
Total: $1,157/month

Iceberg:

Data files: $1,150/month
Metadata files: $8/month
Manifest files: $12/month
Total: $1,170/month

Winner: Delta Lake (1% cheaper)

Compute Costs

Delta Lake:

Faster writes save $200/month
Z-ordering optimization saves $300/month
Total savings: $500/month

Iceberg:

Faster metadata ops save $100/month
Better pruning saves $200/month
Total savings: $300/month

Decision Framework

Choose Delta Lake if:

✅ Using Databricks
✅ Heavy streaming workloads
✅ Need Z-ordering
✅ Want deletion vectors
✅ Unity Catalog integration
✅ Team knows Delta well

Choose Iceberg if:

✅ Multi-engine environment (Spark + Presto + Flink)
✅ Need partition evolution
✅ Want hidden partitioning
✅ Engine-agnostic architecture
✅ Metadata-heavy operations
✅ Open table format requirement

Our Choice

We use both:

Delta Lake for:

Real-time streaming pipelines (Kafka → Spark → Delta)
Databricks-native workflows
Tables needing Z-ordering

Iceberg for:

Analytics tables queried by multiple engines
Tables requiring partition evolution
Cross-platform data sharing

Future Outlook

Delta Lake Roadmap

Improved multi-engine support
Better open-source features
Partition evolution (planned)

Iceberg Roadmap

Better streaming support
More optimization features
Wider adoption

Key Takeaways

Not one-size-fits-all - Both have strengths
Delta excels in Databricks + streaming
Iceberg wins for multi-engine + flexibility
Performance is comparable in most cases
Migration is possible but costly
Use both if your architecture justifies it

The "better" table format depends on your infrastructure, team, and use cases.

Related: Delta Lake Optimization | Scaling Spark