Learning Hub

Tutorials

Hands-on technical tutorials on Python, SQL, PySpark, Databricks, System Design, DSA, Data Engineering โ€” and AI coming soon. Built from real production experience.

21

Tutorials

7

Topics

21h

Content

โ€”

PDF Docs

Showing 21 of 21 tutorials

๐ŸPython

Python for Data Engineers

Core Python concepts every data engineer needs โ€” generators, decorators, context managers, and writing clean, production-ready scripts.

Beginner 45 min
Start
PythonFundamentals
๐ŸPython

Advanced Python: Concurrency & Async

Threading, multiprocessing, and asyncio patterns for building fast data pipelines. Covers concurrent API calls and parallel file processing.

Advanced 60 min
Start
PythonAsyncPerformance
๐ŸPython

Python Data Classes & Pydantic

Build type-safe data pipelines using dataclasses and Pydantic models for schema validation, serialization, and config management.

Intermediate 30 min
Start
PythonPydanticType Safety
๐Ÿ—„๏ธSQL

SQL Fundamentals for Data Engineers

Master SELECT, JOINs, GROUP BY, window functions, and CTEs. The SQL patterns you'll use every single day in production.

Beginner 50 min
Start
SQLFundamentals
๐Ÿ—„๏ธSQL

Advanced SQL: Window Functions Deep Dive

ROW_NUMBER, RANK, LAG, LEAD, NTILE and running totals. Real-world examples on financial and event data with performance tips.

Intermediate 45 min
Start
SQLWindow FunctionsAnalytics
๐Ÿ—„๏ธSQL

Query Optimization & Execution Plans

How to read EXPLAIN plans, understand indexes, partition pruning, and rewrite slow queries for 10x performance gains.

Advanced 60 min
Start
SQLPerformanceOptimization
โšกPySpark

PySpark Getting Started

SparkSession, DataFrames, transformations vs actions, lazy evaluation, and your first PySpark pipeline from scratch.

Beginner 55 min
Start
PySparkSparkGetting Started
โšกPySpark

Spark Performance Tuning

Partitioning strategies, broadcast joins, shuffle optimization, caching, and Spark UI deep dive for production-grade pipelines.

Advanced 75 min
Start
PySparkPerformanceTuning
โšกPySpark

Spark Structured Streaming

Build real-time pipelines with Spark Structured Streaming โ€” watermarks, triggers, output modes, and exactly-once guarantees.

Advanced 70 min
Start
PySparkStreamingKafka
๐Ÿ”ทDatabricks

Databricks Platform Overview

Clusters, notebooks, jobs, Unity Catalog, and DBFS. Everything you need to be productive on the Databricks Lakehouse Platform.

Beginner 40 min
Start
DatabricksDelta LakeLakehouse
๐Ÿ”ทDatabricks

Delta Lake: ACID Transactions & Time Travel

How Delta Lake works under the hood โ€” transaction logs, MERGE operations, schema evolution, and time travel queries.

Intermediate 55 min
Start
DatabricksDelta LakeACID
๐Ÿ”ทDatabricks

Medallion Architecture on Databricks

Design and implement Bronze, Silver, Gold layers using Delta Live Tables, Auto Loader, and Unity Catalog for enterprise lakehouses.

Advanced 90 min
Start
DatabricksMedallionArchitecture
๐Ÿ—๏ธSystem Design

Data Pipeline System Design

How to design scalable ETL/ELT pipelines โ€” batch vs streaming trade-offs, idempotency, backfill strategies, and SLA design.

Intermediate 60 min
Start
System DesignArchitecturePipelines
๐Ÿ—๏ธSystem Design

Designing a Data Lakehouse from Scratch

End-to-end system design for a production lakehouse โ€” ingestion, storage format, compute, governance, and serving layer.

Advanced 80 min
Start
System DesignLakehouseArchitecture
๐Ÿ—๏ธSystem Design
Soon

Real-Time Streaming Architecture

Design a low-latency event streaming system using Kafka, Spark Streaming, and Delta Lake with exactly-once delivery guarantees.

Advanced 75 min
System DesignKafkaStreaming
๐Ÿง DSA

DSA for Data Engineering Interviews

The data structures and algorithms patterns that actually come up in data engineering interviews โ€” arrays, hashmaps, graphs, and sorting.

Intermediate 65 min
Start
DSAInterviewsAlgorithms
๐Ÿง DSA

Graph Algorithms for Data Lineage

BFS, DFS, topological sort and how they power DAG scheduling in Airflow, data lineage tracking, and dependency resolution.

Advanced 55 min
Start
DSAGraphsAirflow
๐Ÿ”งData Engineering

Data Modeling: Dimensional & Data Vault

Star schema, snowflake schema, and Data Vault 2.0 โ€” when to use each, trade-offs, and hands-on dbt implementation.

Intermediate 60 min
Start
Data ModelingdbtWarehouse
๐Ÿ”งData Engineering

Apache Kafka for Data Engineers

Topics, partitions, consumer groups, offsets, and schema registry. Build a production-grade Kafka pipeline step by step.

Intermediate 70 min
Start
KafkaStreamingReal-time
๐Ÿ”งData Engineering

Data Quality with Great Expectations

Write data contracts, define expectations, set up checkpoints, and integrate quality gates into your Airflow DAGs.

Intermediate 50 min
Start
Data QualityGreat ExpectationsAirflow
๐Ÿ”งData Engineering

Apache Airflow: Production DAGs

DAG design patterns, dynamic task generation, XComs, sensors, SLA alerts, and CI/CD for Airflow in production.

Advanced 80 min
Start
AirflowOrchestrationCI/CD
๐Ÿค–

AI & ML Tutorials โ€” Coming Soon

LLMs, RAG pipelines, vector databases, feature stores, and AI for data engineers.

In Progress

Want more?

See These Skills in Production

Explore real-world projects where I apply every one of these technologies at scale.

View Projects