Real-Time Credit Decisioning — 48h Batch → < 2min Streaming
Replaced overnight batch credit scoring with Kafka-driven real-time pipeline. PySpark micro-batch feature engineering computes 200+ credit risk signals in real time, integrated with REST model serving layer. Reduced decisioning latency from 48 hours to under 2 minutes while maintaining 95%+ model accuracy at 100K+ applications/day.
View on GitHubProblem
Overnight batch credit scoring took 48 hours for decisions. Business losing competitive advantage due to delays. Hard-coded scoring rules couldn't adapt. No real-time fraud detection capabilities.
Solution
Built Kafka-driven streaming pipeline with PySpark for real-time feature engineering of 200+ credit risk signals. Integrated with MLflow model serving via REST API. Implemented A/B testing framework and model performance monitoring with automated rollback.
Architecture
Application Events (Kafka) → PySpark micro-batch feature engineering → Redis feature cache → MLflow model serving (REST) → Decision output
Key Challenges
- ▸Sub-2-minute latency requirement across entire pipeline including external API calls
- ▸Real-time computation of 200+ features with proper error handling and fallbacks
- ▸Model versioning and safe deployment with A/B testing and automatic rollback
- ▸Maintaining 95%+ accuracy while transitioning from batch to streaming scoring