Cost Engineering Framework — 40% Platform Spend Reduction
Automated framework for Spark cluster rightsizing, S3 → Glacier storage tiering, and cross-workspace cost anomaly detection using Isolation Forest ML. Built centralized cost analytics aggregating AWS Cost Explorer, Databricks, and Snowflake usage. Achieved 40% platform spend reduction in 90 days.
View on GitHubProblem
Cloud data costs growing 40% YoY with no visibility into spend drivers. Idle Databricks clusters running 24/7. No automated cost anomaly detection. Teams had no cost accountability or chargeback.
Solution
Developed cost engineering framework with: (1) Automated Spark cluster rightsizing recommendations, (2) S3 lifecycle policies with intelligent tiering, (3) Isolation Forest ML for cost anomaly detection, (4) Team-level chargeback reports with Slack alerts.
Architecture
Cost APIs (AWS/Databricks/Snowflake/GCP) → ETL Pipeline (Python) → Cost Database (PostgreSQL) → ML Anomaly Detection → Alerts + Dashboards
Key Challenges
- ▸Normalizing cost data across multiple cloud platforms and billing models
- ▸Building ML models to detect cost anomalies without excessive false positives
- ▸Implementing fair cost allocation for shared resources across teams
- ▸Creating actionable recommendations that don't disrupt data SLAs