Delta Lake vs Iceberg: Architecture Deep-Dive
After running both Delta Lake and Iceberg in production, here's a detailed comparison based on real workloads processing 50TB+ daily.
The Question
Your lakehouse needs a table format. Delta Lake or Iceberg? The answer isn't straightforward.
Architecture Comparison
Delta Lake
Core Design:
- Transaction log stored as JSON files in
_delta_log/ - ACID via optimistic concurrency
- Tightly integrated with Spark
- Databricks-first, then open-sourced
Transaction Log:
_delta_log/
00000000000000000000.json # Initial commit
00000000000000000001.json # Add data
00000000000000000002.json # Update
00000000000000000010.checkpoint.parquet # Checkpoint every 10
Apache Iceberg
Core Design:
- Metadata stored as Avro files
- Snapshot isolation with manifest files
- Engine-agnostic (Spark, Flink, Presto, Trino)
- Netflix open-source project
Metadata Structure:
metadata/
v1.metadata.json # Table metadata
snap-123.avro # Snapshot manifest
manifest-list.avro # Manifest list
manifest-456.avro # Data file manifest
Feature Comparison
| Feature | Delta Lake | Iceberg | Winner |
|---|---|---|---|
| ACID Transactions | ✅ Yes | ✅ Yes | Tie |
| Time Travel | ✅ Yes | ✅ Yes | Tie |
| Schema Evolution | ✅ Yes | ✅ Yes | Tie |
| Partition Evolution | ❌ No | ✅ Yes | Iceberg |
| Hidden Partitioning | ❌ No | ✅ Yes | Iceberg |
| Z-Ordering | ✅ Yes | ❌ No | Delta |
| Deletion Vectors | ✅ Yes | ❌ No | Delta |
| Streaming Writes | ✅ Excellent | ⚠️ Good | Delta |
| Multi-Engine | ⚠️ Limited | ✅ Excellent | Iceberg |
| Databricks Integration | ✅ Native | ⚠️ Good | Delta |
Performance Benchmarks
1. Write Performance
Test: Write 10M rows (5GB)
Delta Lake:
df.write.format("delta").mode("append").save("s3://bucket/delta-table")
# Time: 45 secondsIceberg:
df.writeTo("iceberg.db.table").append()
# Time: 52 secondsWinner: Delta Lake (15% faster writes)
2. Read Performance
Test: Full table scan (50GB)
Delta Lake:
spark.read.format("delta").load("s3://bucket/delta-table")
# Time: 23 secondsIceberg:
spark.read.format("iceberg").load("iceberg.db.table")
# Time: 21 secondsWinner: Iceberg (9% faster reads)
3. Time Travel Query
Test: Query snapshot from 7 days ago
Delta Lake:
spark.read.format("delta").option("versionAsOf", 150).load("s3://bucket/delta-table")
# Time: 25 secondsIceberg:
spark.read.format("iceberg").option("snapshot-id", 123456).load("iceberg.db.table")
# Time: 22 secondsWinner: Iceberg (12% faster)
4. Metadata Operations
Test: ALTER TABLE ADD COLUMN
Delta Lake:
spark.sql("ALTER TABLE delta_table ADD COLUMNS (new_col STRING)")
# Time: 0.5 secondsIceberg:
spark.sql("ALTER TABLE iceberg.db.table ADD COLUMN new_col STRING")
# Time: 0.2 secondsWinner: Iceberg (60% faster metadata ops)
Real-World Use Cases
Delta Lake Wins
1. Databricks-First Architecture
# Native Databricks features
spark.sql("""
OPTIMIZE delta_table
ZORDER BY (date, user_id)
""")
# Deletion vectors (v3.0+)
spark.sql("""
DELETE FROM delta_table
WHERE user_id = 'inactive'
""")
# No rewrite needed with deletion vectors2. Streaming Workloads
# Structured Streaming to Delta
stream = spark.readStream.format("kafka")...
stream.writeStream \
.format("delta") \
.outputMode("append") \
.option("checkpointLocation", "s3://checkpoints/") \
.start("s3://bucket/delta-table")
# Excellent streaming performance3. Unity Catalog Integration
# Seamless with Unity Catalog
spark.sql("CREATE TABLE unity_catalog.schema.table USING delta")
# Automatic governance, lineage, access controlIceberg Wins
1. Multi-Engine Environments
-- Spark
SELECT * FROM iceberg.db.table;
-- Presto
SELECT * FROM iceberg.db.table;
-- Flink
SELECT * FROM iceberg.db.table;
-- Trino
SELECT * FROM iceberg.db.table;
-- All engines see same data, no conversion2. Partition Evolution
# Start with daily partitions
spark.sql("""
CREATE TABLE iceberg.events (
event_time timestamp,
user_id string,
event_type string
) PARTITIONED BY (days(event_time))
""")
# Later, change to hourly WITHOUT rewriting data
spark.sql("""
ALTER TABLE iceberg.events
SET PARTITION SPEC (hours(event_time))
""")
# Iceberg handles both partition specs transparently3. Hidden Partitioning
-- Users write without partition columns
INSERT INTO iceberg.events
VALUES (
timestamp '2024-03-15 14:30:00',
'user123',
'click'
);
-- Iceberg automatically partitions
-- No PARTITION BY in INSERT statements neededMigration Considerations
Delta → Iceberg
Why migrate:
- Need multi-engine support
- Want partition evolution
- Require hidden partitioning
How to migrate:
# Read Delta
df = spark.read.format("delta").load("s3://bucket/delta-table")
# Write to Iceberg
df.writeTo("iceberg.db.table").create()
# Validate
delta_count = spark.read.format("delta").load("s3://bucket/delta-table").count()
iceberg_count = spark.table("iceberg.db.table").count()
assert delta_count == iceberg_countCost: Full data rewrite required
Iceberg → Delta
Why migrate:
- Moving to Databricks
- Need Z-ordering
- Want deletion vectors
- Better streaming performance
How to migrate:
# Read Iceberg
df = spark.table("iceberg.db.table")
# Write to Delta
df.write.format("delta").mode("overwrite").save("s3://bucket/delta-table")Production Patterns
Delta Lake Best Practices
# 1. Enable auto-optimize
spark.sql("""
ALTER TABLE delta_table
SET TBLPROPERTIES (
'delta.autoOptimize.optimizeWrite' = 'true',
'delta.autoOptimize.autoCompact' = 'true'
)
""")
# 2. Regular OPTIMIZE + ZORDER
spark.sql("OPTIMIZE delta_table ZORDER BY (date, user_id)")
# 3. VACUUM old files (after retention)
spark.sql("VACUUM delta_table RETAIN 168 HOURS") # 7 days
# 4. Enable deletion vectors (v3.0+)
spark.conf.set("spark.databricks.delta.properties.defaults.enableDeletionVectors", "true")Iceberg Best Practices
# 1. Compact metadata regularly
spark.sql("CALL iceberg.system.rewrite_manifests('db.table')")
# 2. Expire old snapshots
spark.sql("""
CALL iceberg.system.expire_snapshots(
table => 'db.table',
older_than => timestamp '2024-01-01 00:00:00'
)
""")
# 3. Rewrite small files
spark.sql("""
CALL iceberg.system.rewrite_data_files(
table => 'db.table',
strategy => 'binpack',
options => map('target-file-size-bytes', '1073741824')
)
""")
# 4. Use hidden partitioning
spark.sql("""
CREATE TABLE db.events (
timestamp timestamp,
data string
) PARTITIONED BY (days(timestamp))
""")Cost Analysis
Storage Costs (50TB table)
Delta Lake:
- Data files: $1,150/month
- Transaction log: $5/month
- Checkpoints: $2/month
- Total: $1,157/month
Iceberg:
- Data files: $1,150/month
- Metadata files: $8/month
- Manifest files: $12/month
- Total: $1,170/month
Winner: Delta Lake (1% cheaper)
Compute Costs
Delta Lake:
- Faster writes save $200/month
- Z-ordering optimization saves $300/month
- Total savings: $500/month
Iceberg:
- Faster metadata ops save $100/month
- Better pruning saves $200/month
- Total savings: $300/month
Decision Framework
Choose Delta Lake if:
✅ Using Databricks
✅ Heavy streaming workloads
✅ Need Z-ordering
✅ Want deletion vectors
✅ Unity Catalog integration
✅ Team knows Delta well
Choose Iceberg if:
✅ Multi-engine environment (Spark + Presto + Flink)
✅ Need partition evolution
✅ Want hidden partitioning
✅ Engine-agnostic architecture
✅ Metadata-heavy operations
✅ Open table format requirement
Our Choice
We use both:
Delta Lake for:
- Real-time streaming pipelines (Kafka → Spark → Delta)
- Databricks-native workflows
- Tables needing Z-ordering
Iceberg for:
- Analytics tables queried by multiple engines
- Tables requiring partition evolution
- Cross-platform data sharing
Future Outlook
Delta Lake Roadmap
- Improved multi-engine support
- Better open-source features
- Partition evolution (planned)
Iceberg Roadmap
- Better streaming support
- More optimization features
- Wider adoption
Key Takeaways
- Not one-size-fits-all - Both have strengths
- Delta excels in Databricks + streaming
- Iceberg wins for multi-engine + flexibility
- Performance is comparable in most cases
- Migration is possible but costly
- Use both if your architecture justifies it
The "better" table format depends on your infrastructure, team, and use cases.
Related: Delta Lake Optimization | Scaling Spark