About Me
Senior Data
Engineer.
6+ years designing and operating distributed data systems at scale — batch pipelines, streaming platforms, and lakehouse architecture on AWS and GCP.
Core focus: Spark performance tuning, Delta Lake table design, and Kafka-based streaming with exactly-once semantics. Data correctness, pipeline observability, and failure recovery are treated as system requirements — not operational concerns.
Targeting senior and staff-level data engineering roles — fintech, ML infrastructure, and large-scale data platform teams. Apache Spark, PySpark, Kafka, Databricks, Delta Lake, Snowflake, BigQuery, AWS, GCP, dbt, Airflow, Terraform.
Engineering Philosophy
Correctness First
Exactly-once delivery, idempotent writes, and late-event handling are system requirements — not afterthoughts. Pipelines that are correct under failure.
Performance at Scale
1B+ events/day Spark pipelines, sub-10ms feature serving, and 70% query improvements through physical data modelling and smart partitioning.
Operational Simplicity
Systems built for observability — structured logging, data quality checks, automated alerting, and runbook-driven incident response from day one.
Lakehouse Architecture
Medallion patterns on Delta Lake and Apache Iceberg with Unity Catalog governance. Schema evolution, time-travel, and zero-copy cloning as standard.
Technical Proficiency
Data Engineering Core
Warehousing & Lakehouse
Cloud & Infrastructure
Experience
Senior Data Engineer
— Fintech Platform (100K+ apps/day)2023 – PresentKafka → PySpark real-time credit decisioning pipeline. Reduced decision latency from 48 hours to under 2 minutes while maintaining 95%+ model accuracy at 100K+ applications/day
Built ML Feature Store on Databricks serving 1,000+ features with point-in-time correctness for offline training and sub-10ms retrieval for online inference across 4 ML teams
Designed cost governance framework with auto-remediation across 20+ Databricks workspaces — reduced combined AWS + GCP data platform spend by 40% within 90 days
Data Platform Engineer
— Enterprise Data Platform2020 – 2023Built Kafka → Spark Structured Streaming pipelines processing 1B+ events/day with exactly-once delivery guarantees to Delta Lake — reduced end-to-end latency from minutes to under 5 seconds
Replaced 50+ siloed ingestion jobs with a unified Databricks medallion lakehouse. Cut pipeline execution time by 60% and eliminated cross-team schema inconsistencies with Unity Catalog
Led migration of 100TB from Redshift + Oracle to Snowflake + BigQuery using dual-write validation strategy — zero downtime, 70% query performance improvement
Data Engineer
— Analytics Consultancy2018 – 2020Designed Airflow DAGs for multi-source ELT workflows across 50+ upstream sources into BigQuery and Snowflake
Reduced BigQuery costs 60% via date partitioning, clustering, and materialized view optimisation
Built cloud-native ingestion pipelines from REST APIs, CDC streams, and file sources into GCS and BigQuery
Certifications & Recognition
