Tutorials
Hands-on technical tutorials on Python, SQL, PySpark, Databricks, System Design, DSA, Data Engineering โ and AI coming soon. Built from real production experience.
21
Tutorials
7
Topics
21h
Content
โ
PDF Docs
Showing 21 of 21 tutorials
Python for Data Engineers
Core Python concepts every data engineer needs โ generators, decorators, context managers, and writing clean, production-ready scripts.
Advanced Python: Concurrency & Async
Threading, multiprocessing, and asyncio patterns for building fast data pipelines. Covers concurrent API calls and parallel file processing.
Python Data Classes & Pydantic
Build type-safe data pipelines using dataclasses and Pydantic models for schema validation, serialization, and config management.
SQL Fundamentals for Data Engineers
Master SELECT, JOINs, GROUP BY, window functions, and CTEs. The SQL patterns you'll use every single day in production.
Advanced SQL: Window Functions Deep Dive
ROW_NUMBER, RANK, LAG, LEAD, NTILE and running totals. Real-world examples on financial and event data with performance tips.
Query Optimization & Execution Plans
How to read EXPLAIN plans, understand indexes, partition pruning, and rewrite slow queries for 10x performance gains.
PySpark Getting Started
SparkSession, DataFrames, transformations vs actions, lazy evaluation, and your first PySpark pipeline from scratch.
Spark Performance Tuning
Partitioning strategies, broadcast joins, shuffle optimization, caching, and Spark UI deep dive for production-grade pipelines.
Spark Structured Streaming
Build real-time pipelines with Spark Structured Streaming โ watermarks, triggers, output modes, and exactly-once guarantees.
Databricks Platform Overview
Clusters, notebooks, jobs, Unity Catalog, and DBFS. Everything you need to be productive on the Databricks Lakehouse Platform.
Delta Lake: ACID Transactions & Time Travel
How Delta Lake works under the hood โ transaction logs, MERGE operations, schema evolution, and time travel queries.
Medallion Architecture on Databricks
Design and implement Bronze, Silver, Gold layers using Delta Live Tables, Auto Loader, and Unity Catalog for enterprise lakehouses.
Data Pipeline System Design
How to design scalable ETL/ELT pipelines โ batch vs streaming trade-offs, idempotency, backfill strategies, and SLA design.
Designing a Data Lakehouse from Scratch
End-to-end system design for a production lakehouse โ ingestion, storage format, compute, governance, and serving layer.
Real-Time Streaming Architecture
Design a low-latency event streaming system using Kafka, Spark Streaming, and Delta Lake with exactly-once delivery guarantees.
DSA for Data Engineering Interviews
The data structures and algorithms patterns that actually come up in data engineering interviews โ arrays, hashmaps, graphs, and sorting.
Graph Algorithms for Data Lineage
BFS, DFS, topological sort and how they power DAG scheduling in Airflow, data lineage tracking, and dependency resolution.
Data Modeling: Dimensional & Data Vault
Star schema, snowflake schema, and Data Vault 2.0 โ when to use each, trade-offs, and hands-on dbt implementation.
Apache Kafka for Data Engineers
Topics, partitions, consumer groups, offsets, and schema registry. Build a production-grade Kafka pipeline step by step.
Data Quality with Great Expectations
Write data contracts, define expectations, set up checkpoints, and integrate quality gates into your Airflow DAGs.
Apache Airflow: Production DAGs
DAG design patterns, dynamic task generation, XComs, sensors, SLA alerts, and CI/CD for Airflow in production.
AI & ML Tutorials โ Coming Soon
LLMs, RAG pipelines, vector databases, feature stores, and AI for data engineers.
Want more?
See These Skills in Production
Explore real-world projects where I apply every one of these technologies at scale.
View Projects