Back to Projects
60% pipeline runtime reduction, 50+ sources unified, 0 schema conflicts, full data lineage with Unity Catalog

Enterprise Lakehouse — Databricks Medallion Architecture

Unified 50+ isolated AWS Glue jobs into a Databricks Delta Lake medallion architecture (Bronze/Silver/Gold). Unity Catalog for governance, dbt for schema contracts, Photon-powered Gold layer. Achieved 60% pipeline runtime reduction and eliminated schema conflicts across 8 engineering teams.

View on GitHub

Problem

Fragmented multi-warehouse topology with 50+ isolated AWS Glue jobs across 8 teams. No shared catalog led to constant schema conflicts. Full table refreshes caused 6+ hour batch windows. No data quality gates led to bad data reaching production.

Solution

Implemented Databricks medallion architecture: Bronze layer for raw ingestion with CDC, Silver layer with dbt schema contracts and Great Expectations quality gates, Gold layer with Photon engine for BI. Unity Catalog provided centralized governance and lineage tracking.

Architecture

Raw Sources (Kafka, S3, APIs, CDC) → Bronze (Delta raw) → Silver (dbt + DQ checks) → Gold (Photon KPIs) → Unity Catalog lineage

Key Challenges

  • Migrating 50+ legacy Glue jobs to unified architecture without business disruption
  • Establishing dbt contract testing patterns that scale across 8 teams
  • Implementing automated data quality gates using Great Expectations at scale
  • Optimizing Delta Lake file sizes and partition strategies for Photon engine

Tech Stack

DatabricksDelta LakePySparkUnity CatalogdbtApache SparkGreat ExpectationsAWS S3