Pipelines that
scale. Systems
that don't fail.
6+ years owning production data infrastructure that processes 1B+ events/day — Kafka streaming, Databricks lakehouses, and 100TB+ warehouse migrations on AWS · GCP · Databricks. Correctness and fault tolerance are non-negotiable.

Vasudev Rao
Data Engineer · 6+ Years
1B+
Events/Day
< 5s
Stream Latency
99.9%
Pipeline SLA
What I Deliver
Stack, Skills & Experience
Processing 50TB+ of batch and streaming data daily across production pipelines on AWS and GCP.
Core Specialisms
Production Data Stack
Languages & Processing
Streaming & Messaging
Lakehouse & Storage
Warehousing
Orchestration & Quality
Cloud & Infrastructure
Experience
Kafka → PySpark real-time credit decisioning. 48h batch → <2min streaming. 200+ risk features computed in-flight. 95%+ model accuracy at 100K+ apps/day.
Unified 50+ AWS Glue jobs into Databricks medallion lakehouse. 60% pipeline runtime reduction. Unity Catalog governance across 8 teams. Zero schema conflicts.
Redshift + Oracle → Snowflake + BigQuery. Dual-write validation, zero downtime. p95 query time 42s → 11s. 40% cost reduction.
System Design
Architecture Patterns
Production patterns I design, operate, and own in production — built for correctness, scale, and operational simplicity.
Writing
Blog & Articles
How LLMs Will Transform Data Engineering: The AI-Powered Future
How to Build a Lakehouse Using Delta Lake & Databricks
Snowflake vs BigQuery vs Databricks — Architecture Breakdown
Data Engineering System Design for Real-Time Applications
Subscribe to newsletter
Weekly insights on data engineering, Spark, dbt, streaming architectures, and cloud data platforms.
Open to Senior Data Engineering Roles
Let's build data systems
that don't break at 3am.
Targeting senior and staff-level data engineering roles — fintech, ML infrastructure, and platform teams. Also available for consulting on streaming architecture, Spark performance, and lakehouse design.