
I build data systems
that scale to billions
Senior Data Engineer specialising in real-time pipelines, distributed systems, and lakehouse architectures.
Portfolio
Selected Work
10M Events/Day Kafka → Spark Streaming Pipeline
Production Kafka → Spark Structured Streaming pipeline processing 10M+ events/day with exactly-once delivery to Delta Lake. Watermark-based late-event handling, idempotent MERGE upserts, and dead-letter queue with automatic replay. Reduced end-to-end latency from 8 minutes to under 5 seconds.
Enterprise Lakehouse — Databricks Medallion Architecture
Unified 50+ isolated AWS Glue jobs into a Databricks Delta Lake medallion architecture (Bronze/Silver/Gold). Unity Catalog for governance, dbt for schema contracts, Photon-powered Gold layer. Achieved 60% pipeline runtime reduction and eliminated schema conflicts across 8 engineering teams.
100TB Warehouse Migration — Redshift & Oracle → Snowflake + BigQuery
Led migration of 100+ TB from on-premise Oracle and legacy AWS Redshift to Snowflake and BigQuery using dual-write validation strategy. Re-modeled physical layer with micro-partition clustering and incremental ELT using dbt. Achieved 70% query performance improvement (p95: 42s → 11s) and 40% cost reduction with zero-downtime cutover.
How I work
Engineering Principles
Reliability First
Systems designed for 99.9% SLA. Every pipeline ships with observability, alerting, and a documented recovery path.
Scalable by Design
Architecture that grows with your data — from gigabytes to petabytes without re-engineering the foundation.
Deep Observability
You can't fix what you can't see. Rich metrics, distributed tracing, and cost visibility across every layer.
Let's talk
Let's build something scalable
Need production-grade data systems that actually scale?
Let's discuss your architecture.