Writing

Blog

Deep dives into data engineering patterns, architecture decisions, and lessons learned from building production data systems.

21

Articles

10+

Topics

Coffee consumed

Data EngineeringDelta LakeApache Spark

Delta Lake Optimization: OPTIMIZE, ZORDER, and VACUUM in Production

Running Delta Lake without a tuning strategy is how you end up with 400,000 small files, query times that keep climbing, and storage bills you can't explain. Here's exactly how I manage OPTIMIZE, ZORDER, and VACUUM in production.

2026-05-1914 min read
Read Article
Data EngineeringDelta LakeApache Spark

Delta Lake Optimization: OPTIMIZE, ZORDER, Liquid Clustering, and VACUUM in Production

Delta Lake doesn't maintain itself. Here's the compaction, clustering, and cleanup strategy I run in production — and when to drop ZORDER for Liquid Clustering.

2026-05-1918 min read
Read Article
Data EngineeringBatch ProcessingStreaming

Batch vs Streaming: How I Decide in Real-World Data Systems

Not every pipeline needs Kafka and Spark Streaming. Here's the decision framework I use in production to choose between batch and streaming — and why getting it wrong is expensive.

2026-05-0113 min read
Read Article
KafkaSparkStreaming

Designing a 10M Events per Day Kafka to Spark Streaming Pipeline

How I built a real-time streaming pipeline with sub-5-second latency, exactly-once guarantees, and zero data loss.

2026-04-2510 min read
Read Article
delta-lakesparkdatabricks

Delta Lake Optimization: From Slow to Fast

Practical patterns for turning a slow, bloated Delta Lake into a fast, cost-efficient one — covering file compaction, Z-Ordering, partition tuning, caching, and query acceleration that cut query time by 91%.

2024-11-079 min read
Read Article
lakehousedelta-lakespark

Designing a Modern Lakehouse Data Platform: End-to-End Architecture

A complete architectural walkthrough of a production lakehouse — covering ingestion, storage layers, transformation pipelines, query engines, and governance — built to handle 500TB+ at scale.

2024-10-0311 min read
Read Article
kafkakinesisaws

Kafka vs Kinesis: Architectural Trade-offs

A deep technical comparison of Apache Kafka and AWS Kinesis across throughput, ordering guarantees, consumer models, operational overhead, and cost — based on running both in production.

2024-09-1110 min read
Read Article
sparkscalapyspark

Scaling Spark to 100TB: Production Patterns

Hard-won lessons from scaling Apache Spark pipelines to 100TB workloads — covering cluster sizing, shuffle tuning, memory management, and partition strategies that actually hold up at scale.

2024-08-1911 min read
Read Article
kafkadebuggingstreaming

Debugging Kafka Consumer Lag in Production: A Real Case Study

How we diagnosed and eliminated 4.2 million message consumer lag in a production Kafka pipeline — covering partition imbalance, deserialization bottlenecks, and rebalance storms.

2024-07-2210 min read
Read Article
delta-lakesparkdatabricks

Delta Lake Optimization: OPTIMIZE, ZORDER, and VACUUM in Production

A practical guide to compacting small files, co-locating data with Z-Ordering, and reclaiming storage with VACUUM — battle-tested strategies for production Delta Lakes.

2024-06-109 min read
Read Article
Data EngineeringApache KafkaSpark

End-to-End Data Pipeline Case Study: From Raw Events to Business Insights

A deep dive into building a production-grade data pipeline that ingests millions of raw events daily and transforms them into actionable business insights using modern data engineering tools.

2024-05-1918 min read
Read Article
Data QualityObservabilityData Engineering

Data Quality & Observability in Data Pipelines: What Most Engineers Miss

Most data engineers add monitoring as an afterthought. Here's the systematic approach to data quality and observability that catches silent failures before they reach your stakeholders.

2024-05-1914 min read
Read Article

Snowflake vs BigQuery vs Databricks: How I Decide in Real Projects

A practical, opinionated guide to choosing the right data platform based on real-world project experience — not just feature matrices.

2024-04-2912 min read
Read Article

Designing Fault-Tolerant Streaming Systems: Lessons from Production

Hard-won lessons from running streaming pipelines at scale. Real failure modes, recovery patterns, and the architectural decisions that saved us at 3am.

2024-04-2914 min read
Read Article

Building Data APIs on Top of Your Lakehouse: Serving Layer Design

How to design a production-grade serving layer on top of Delta Lake or Iceberg. REST APIs, GraphQL, caching strategies, and the patterns that actually work at scale.

2024-04-2913 min read
Read Article
FinOpsCost OptimizationDatabricks

Data Platform Cost Optimization: A FinOps Approach for Data Engineers

How we reduced our data platform spend by 40% ($500K annually) through systematic cost engineering. Real strategies, tools, and lessons learned from optimizing Databricks, Snowflake, and cloud infrastructure.

2024-04-1018 min read
Read Article
sparkoptimizationcost

Optimizing Spark Jobs: 10 Patterns That Cut Costs by 60%

Deep dive into Spark optimization techniques including broadcast joins, partition tuning, and cache strategies that dramatically reduce cluster costs.

2024-03-158 min read
Read Article
kafkastreamingreliability

Building Fault-Tolerant Kafka Pipelines

Production-ready patterns for exactly-once semantics, dead letter queues, and graceful failure handling in streaming pipelines.

2024-02-2810 min read
Read Article

Optimizing Spark Jobs to Reduce Costs by 60%

Practical techniques to dramatically reduce Spark compute costs through partition tuning, caching strategies, and cluster configuration.

2024-01-2012 min read
Read Article

Building Fault-Tolerant Kafka Pipelines

Production patterns for building resilient Kafka streaming pipelines that survive failures, handle backpressure, and maintain exactly-once semantics.

2023-12-0511 min read
Read Article

Delta Lake vs Iceberg: Architecture Deep-Dive

Technical comparison of Delta Lake and Apache Iceberg table formats. Real-world performance benchmarks, feature analysis, and when to use each.

2023-11-1515 min read
Read Article