100TB Warehouse Migration — Redshift & Oracle → Snowflake + BigQuery
Led migration of 100+ TB from on-premise Oracle and legacy AWS Redshift to Snowflake and BigQuery using dual-write validation strategy. Re-modeled physical layer with micro-partition clustering and incremental ELT using dbt. Achieved 70% query performance improvement (p95: 42s → 11s) and 40% cost reduction with zero-downtime cutover.
View on GitHubProblem
Legacy Oracle and Redshift infrastructure struggling with growing data volumes. T+1 batch windows causing business delays. P95 query times at 42 seconds. No proper incremental ETL patterns. Rising infrastructure costs.
Solution
Implemented dual-write validation strategy allowing zero-downtime cutover. Re-architected physical data model with micro-partition clustering for Snowflake and column ordering for BigQuery. Converted all pipelines to incremental ELT using dbt. Established automated data validation framework.
Architecture
Source Systems (Oracle + Redshift) → Dual-write validation → Snowflake (micro-partitions) + BigQuery (columnar) → dbt incremental ELT → BI layer
Key Challenges
- ▸Ensuring data consistency during dual-write phase across 100+ tables
- ▸Optimizing Snowflake clustering keys and BigQuery partitioning for query patterns
- ▸Converting complex stored procedures to dbt models while maintaining logic
- ▸Achieving zero-downtime cutover with rollback plan for critical business operations