About Me

Senior Data
Engineer.

6+ years designing and operating distributed data systems at scale — batch pipelines, streaming platforms, and lakehouse architecture on AWS and GCP.

Core focus: Spark performance tuning, Delta Lake table design, and Kafka-based streaming with exactly-once semantics. Data correctness, pipeline observability, and failure recovery are treated as system requirements — not operational concerns.

Targeting senior and staff-level data engineering roles — fintech, ML infrastructure, and large-scale data platform teams. Apache Spark, PySpark, Kafka, Databricks, Delta Lake, Snowflake, BigQuery, AWS, GCP, dbt, Airflow, Terraform.

Start a Conversation Download CV

Senior DE

Vasudev Rao

Data Engineer · 6+ Years

Bangalore · Remote Worldwide

AWS · GCP · Databricks

1B+

Events/Day

40%

Cost Saved

50+

Unified

PySpark

Kafka

Databricks

Engineering Philosophy

Correctness First

Exactly-once delivery, idempotent writes, and late-event handling are system requirements — not afterthoughts. Pipelines that are correct under failure.

Performance at Scale

1B+ events/day Spark pipelines, sub-10ms feature serving, and 70% query improvements through physical data modelling and smart partitioning.

Operational Simplicity

Systems built for observability — structured logging, data quality checks, automated alerting, and runbook-driven incident response from day one.

Lakehouse Architecture

Medallion patterns on Delta Lake and Apache Iceberg with Unity Catalog governance. Schema evolution, time-travel, and zero-copy cloning as standard.

Technical Proficiency

Data Engineering Core

dbt (Data Build Tool)98%

Snowflake / BigQuery96%

Python Automation98%

Apache Airflow95%

Apache Spark / PySpark94%

Apache Kafka / Streaming91%

Warehousing & Lakehouse

Delta Lake / Databricks94%

Apache Iceberg88%

AWS Glue / EMR92%

GCP Dataflow / Pub/Sub90%

Great Expectations92%

OpenMetadata / Catalog86%

Cloud & Infrastructure

AWS (S3, Glue, EMR, Redshift)93%

GCP (BigQuery, Dataflow)90%

Terraform / IaC88%

Docker / Kubernetes87%

PostgreSQL / Redshift91%

Looker / Metabase / Power BI89%

Experience

Senior Data Engineer

— Fintech Platform (100K+ apps/day)2023 – Present

→

Kafka → PySpark real-time credit decisioning pipeline. Reduced decision latency from 48 hours to under 2 minutes while maintaining 95%+ model accuracy at 100K+ applications/day

→

Built ML Feature Store on Databricks serving 1,000+ features with point-in-time correctness for offline training and sub-10ms retrieval for online inference across 4 ML teams

→

Designed cost governance framework with auto-remediation across 20+ Databricks workspaces — reduced combined AWS + GCP data platform spend by 40% within 90 days

Data Platform Engineer

— Enterprise Data Platform2020 – 2023

→

Built Kafka → Spark Structured Streaming pipelines processing 1B+ events/day with exactly-once delivery guarantees to Delta Lake — reduced end-to-end latency from minutes to under 5 seconds

→

Replaced 50+ siloed ingestion jobs with a unified Databricks medallion lakehouse. Cut pipeline execution time by 60% and eliminated cross-team schema inconsistencies with Unity Catalog

→

Led migration of 100TB from Redshift + Oracle to Snowflake + BigQuery using dual-write validation strategy — zero downtime, 70% query performance improvement

Data Engineer

— Analytics Consultancy2018 – 2020

→

Designed Airflow DAGs for multi-source ELT workflows across 50+ upstream sources into BigQuery and Snowflake

→

Reduced BigQuery costs 60% via date partitioning, clustering, and materialized view optimisation

→

Built cloud-native ingestion pipelines from REST APIs, CDC streams, and file sources into GCS and BigQuery

Certifications & Recognition

✓

Databricks Certified Data Engineer Associate

Databricks · 2023

✓

AWS Certified Data Analytics Specialty

Amazon Web Services · 2022

✓

Google Professional Data Engineer

Google Cloud · 2022

✓

dbt Certified Analytics Engineer

dbt Labs · 2023

Open to Senior DE Roles & Consulting

Let's build something
at scale.

Interested in distributed data systems, streaming architecture, or lakehouse design? Let's talk.