Kafka vs Kinesis: Architectural Trade-offs

Every data engineering team hits this decision eventually. Both Kafka and Kinesis move event streams reliably at scale — but they make fundamentally different trade-offs around operational complexity, throughput ceilings, consumer flexibility, and cost. After running Kafka clusters processing 2 million events/second and Kinesis pipelines ingesting 400GB/hour on separate products, here is what actually matters when choosing between them.

The Decision Frame

There is no universally correct answer. The right choice depends on three questions:

Who operates it? Kafka is infrastructure you own. Kinesis is infrastructure AWS owns.
What are your throughput and latency requirements? The gap between them is real and measurable.
How many consumers do you have, and how independently do they need to move?

Everything else — pricing, ordering guarantees, replay windows — flows from those three constraints.

Architecture Overview

Apache Kafka

Kafka is a distributed commit log. Producers write to topics partitioned across a broker cluster. Consumers read from partitions independently, tracking their own offsets. The cluster state is managed by ZooKeeper (pre-3.0) or KRaft (3.0+).

Producers → Topics (N partitions) → Broker Cluster (M brokers)
                                           ↓
                              Consumer Groups (independent offsets)
                              ├── Group A: real-time analytics
                              ├── Group B: data lake ingestion
                              └── Group C: ML feature pipeline

# Kafka producer — explicit partition control
from confluent_kafka import Producer
 
producer = Producer({
    'bootstrap.servers': 'kafka-broker-1:9092,kafka-broker-2:9092',
    'acks': 'all',                      # Wait for all ISR replicas
    'retries': 5,
    'linger.ms': 10,                    # Batch for 10ms before sending
    'batch.size': 65536,                # 64KB batch size
    'compression.type': 'lz4',
})
 
def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
 
producer.produce(
    topic='user-events',
    key=user_id.encode(),               # Key determines partition
    value=event_payload,
    callback=delivery_callback
)
producer.flush()

AWS Kinesis Data Streams

Kinesis is a managed shard-based stream. Producers write records to a stream divided into shards. Each shard handles 1MB/s write and 2MB/s read. AWS manages the infrastructure entirely — no brokers to provision, no ZooKeeper to babysit.

Producers → Stream (N shards) → Kinesis Service (AWS-managed)
                                        ↓
                           Consumers (shared 2MB/s read per shard)
                           ├── Lambda trigger
                           ├── KCL application
                           └── Kinesis Firehose → S3

# Kinesis producer — partition key hashes to shard
import boto3
import json
 
kinesis = boto3.client('kinesis', region_name='us-east-1')
 
response = kinesis.put_record(
    StreamName='user-events',
    Data=json.dumps(event_payload).encode(),
    PartitionKey=user_id,               # Hashed to determine shard
)
 
print(f"Shard: {response['ShardId']}, Seq: {response['SequenceNumber']}")
 
# Batch writes — up to 500 records, 5MB total
records = [
    {'Data': json.dumps(e).encode(), 'PartitionKey': e['user_id']}
    for e in event_batch
]
kinesis.put_records(StreamName='user-events', Records=records)

Throughput: The Real Numbers

Kafka Throughput

Kafka throughput scales horizontally with partition count and broker count. There is no hard ceiling imposed by the service — only your hardware and network.

# Kafka throughput estimation
def kafka_throughput(num_brokers, disk_throughput_mb_per_broker,
                     replication_factor, num_partitions):
    # Write throughput limited by replication
    write_throughput = (num_brokers * disk_throughput_mb_per_broker) / replication_factor
    # Read throughput not limited by replication
    read_throughput = num_brokers * disk_throughput_mb_per_broker
    parallelism = num_partitions
 
    print(f"Write throughput: ~{write_throughput:.0f} MB/s")
    print(f"Read throughput:  ~{read_throughput:.0f} MB/s")
    print(f"Max parallelism:  {parallelism} concurrent consumers")
 
# 10-broker cluster, 500MB/s disk each, RF=3, 120 partitions
kafka_throughput(10, 500, 3, 120)
# Write throughput: ~1666 MB/s
# Read throughput:  ~5000 MB/s
# Max parallelism:  120 concurrent consumers

Kinesis Throughput

Kinesis throughput is strictly shard-bounded. Each shard: 1MB/s or 1,000 records/s write, 2MB/s or 5 reads/s read. These are hard limits enforced by the service.

# Kinesis throughput estimation
def kinesis_throughput(num_shards, consumers_sharing_read):
    write_mb_per_sec = num_shards * 1        # 1MB/s per shard
    write_records_per_sec = num_shards * 1000
    read_mb_per_sec = (num_shards * 2) / consumers_sharing_read  # Shared 2MB/s
    read_calls_per_sec = (num_shards * 5) / consumers_sharing_read
 
    print(f"Write:      {write_mb_per_sec} MB/s, {write_records_per_sec:,} rec/s")
    print(f"Read/consumer: {read_mb_per_sec:.1f} MB/s, {read_calls_per_sec:.1f} calls/s")
    print(f"Note: Enhanced fan-out gives each consumer dedicated 2MB/s per shard")
 
# 100 shards, 3 consumers sharing standard reads
kinesis_throughput(100, 3)
# Write:         100 MB/s, 100,000 rec/s
# Read/consumer: 66.7 MB/s, 1.7 calls/s
# Note: Enhanced fan-out gives each consumer dedicated 2MB/s per shard

Enhanced fan-out solves the shared read limit — each registered consumer gets a dedicated 2MB/s per shard push connection. It costs more but eliminates read throttling with multiple consumers.

# Register enhanced fan-out consumer
kinesis.register_stream_consumer(
    StreamARN='arn:aws:kinesis:us-east-1:123456789:stream/user-events',
    ConsumerName='real-time-analytics'
)
# Each registered consumer gets 2MB/s per shard — independent of others

Impact: At 2 million events/second, Kafka handled it with a 20-broker cluster and 240 partitions. Kinesis would require 2,000 shards — a $52,000/month shard cost before data transfer.

Ordering Guarantees

Kafka: Partition-Level Ordering

Kafka guarantees strict ordering within a partition. Messages with the same key always go to the same partition, so per-key ordering is guaranteed across the entire topic.

# Kafka: all events for user_123 land on the same partition in order
producer.produce(
    topic='user-events',
    key='user_123',         # Same key → same partition → ordered
    value=event_a
)
producer.produce(
    topic='user-events',
    key='user_123',
    value=event_b           # Always after event_a for this key
)

# Global ordering across partitions: not guaranteed
# If you need global order, use a single partition — but lose parallelism
producer.produce(
    topic='ordered-events',
    partition=0,            # Force all writes to one partition
    value=event
)
# Throughput ceiling: ~100MB/s; parallelism: 1 consumer

Kinesis: Shard-Level Ordering

Kinesis guarantees ordering within a shard. The same partition key always maps to the same shard, giving per-key ordering — identical to Kafka's model.

# Kinesis: same partition key → same shard → ordered
kinesis.put_record(
    StreamName='user-events',
    Data=event_a,
    PartitionKey='user_123'     # Hashed to shard; ordered within shard
)
kinesis.put_record(
    StreamName='user-events',
    Data=event_b,
    PartitionKey='user_123'     # After event_a on the same shard
)

The ordering model is equivalent. Both systems give you per-key ordering within a partition/shard. Global ordering requires a single partition/shard in both systems, and destroys parallelism in both.

Consumer Model: The Biggest Practical Difference

This is where the systems diverge most meaningfully in day-to-day operations.

Kafka: Fully Independent Consumer Groups

Each consumer group maintains its own offset independently. Adding a new consumer group costs nothing — it reads the topic from any point without affecting other consumers.

# Three completely independent consumers of the same topic
# Each maintains its own offset, moves at its own pace
 
# Consumer Group A: real-time, at the tip of the log
consumer_a = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'realtime-analytics',
    'auto.offset.reset': 'latest'
})
 
# Consumer Group B: batch, running 6 hours behind
consumer_b = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'data-lake-ingestion',
    'auto.offset.reset': 'earliest'
})
 
# Consumer Group C: reprocessing historical data from 30 days ago
consumer_c = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'ml-backfill',
})
consumer_c.seek(partition, offset_from_30_days_ago)
 
# All three consume without awareness of each other

Kinesis: Shared Shard Read Throughput (Standard) or Enhanced Fan-Out

Standard Kinesis consumers share the 2MB/s read limit per shard. Adding consumers degrades everyone's read throughput until you enable enhanced fan-out.

# Standard consumer — shares 2MB/s with all others on this shard
import boto3
 
kinesis = boto3.client('kinesis')
 
shard_iterator = kinesis.get_shard_iterator(
    StreamName='user-events',
    ShardId='shardId-000000000000',
    ShardIteratorType='LATEST'
)['ShardIterator']
 
while True:
    response = kinesis.get_records(ShardIterator=shard_iterator, Limit=1000)
    records = response['Records']
    shard_iterator = response['NextShardIterator']
    # This consumer is consuming from the shared 2MB/s budget
    # A third consumer on this shard means each gets ~0.67MB/s

# Enhanced fan-out — dedicated 2MB/s per shard, push-based
import asyncio
from aiokinesisreader import KinesisReader   # Example async KCL wrapper
 
async def consume_with_fan_out():
    reader = KinesisReader(
        stream_name='user-events',
        consumer_arn='arn:aws:kinesis:...:consumer/realtime-analytics',
        # Each registered consumer gets its own 2MB/s — no sharing
    )
    async for record in reader:
        process(record)

Rule: If you have more than 2 consumers on a Kinesis stream, enable enhanced fan-out. Standard reads at scale are a latency and throttling trap.

Operational Overhead: The Honest Comparison

Kafka Self-Managed

# What you own with self-managed Kafka
# Broker provisioning and scaling
kafka-topics.sh --create --topic user-events \
  --partitions 120 --replication-factor 3 \
  --bootstrap-server kafka:9092
 
# Monitoring: broker JVM, disk usage, ISR shrinks, under-replicated partitions
# Alerting on: consumer lag, leader elections, network saturation
# Upgrades: rolling restarts across broker cluster
# Capacity planning: disk growth, partition reassignment as brokers are added
 
# Reassign partitions when adding brokers
kafka-reassign-partitions.sh \
  --bootstrap-server kafka:9092 \
  --reassignment-json-file reassignment.json \
  --execute

Running self-managed Kafka seriously requires: a dedicated ops rotation, expertise in JVM tuning, ZooKeeper or KRaft operational knowledge, and a monitoring stack covering 30+ broker metrics. Budget 0.5–1 FTE of ops time per Kafka cluster.

Kinesis Managed

# What you own with Kinesis
# Shard management — scaling up and down
kinesis.update_shard_count(
    StreamName='user-events',
    TargetShardCount=200,           # Scale up
    ScalingType='UNIFORM_SCALING'
)
# Note: shard splits/merges take ~30 seconds; there is no auto-scaling by default
 
# Monitor: PutRecords.ThrottledRecords, GetRecords.IteratorAgeMilliseconds,
#          ReadProvisionedThroughputExceeded, WriteProvisionedThroughputExceeded
 
import boto3
cloudwatch = boto3.client('cloudwatch')
 
cloudwatch.put_metric_alarm(
    AlarmName='kinesis-write-throttle',
    MetricName='WriteProvisionedThroughputExceeded',
    Namespace='AWS/Kinesis',
    Statistic='Sum',
    Period=60,
    EvaluationPeriods=3,
    Threshold=100,
    ComparisonOperator='GreaterThanThreshold',
    Dimensions=[{'Name': 'StreamName', 'Value': 'user-events'}]
)

Kinesis operational burden is significantly lower — no brokers, no ZooKeeper, no rolling restarts. The ops surface is shard count management, IAM permissions, and CloudWatch alarms. Budget 0.1 FTE ops time.

Impact: Migrating a 3-stream Kinesis setup to self-managed Kafka added one full-time platform engineer to our team. The throughput gains justified it at our scale. At lower scale, they would not have.

Retention and Replay

Kafka

# Kafka: configure retention per topic
# Time-based (default 7 days, configurable to years)
kafka-configs.sh --bootstrap-server kafka:9092 \
  --entity-type topics --entity-name user-events \
  --alter --add-config retention.ms=2592000000  # 30 days
 
# Size-based retention
kafka-configs.sh --bootstrap-server kafka:9092 \
  --entity-type topics --entity-name user-events \
  --alter --add-config retention.bytes=1099511627776  # 1TB per partition
 
# Infinite retention with Tiered Storage (Kafka 3.6+)
# Hot data on broker disks, cold data on S3 — transparent to consumers

Kinesis

# Kinesis: 24-hour default retention, up to 365 days
kinesis.increase_stream_retention_period(
    StreamName='user-events',
    RetentionPeriodHours=168    # 7 days — costs extra beyond 24 hours
)
 
# Extended retention pricing:
# 24 hours: included in shard-hour cost
# 7 days:   +$0.020 per shard-hour
# 365 days: +$0.023 per shard-hour (Long-term retention)
 
# Replay from a specific timestamp
import datetime
 
timestamp = datetime.datetime(2024, 9, 1, 0, 0, 0)
shard_iterator = kinesis.get_shard_iterator(
    StreamName='user-events',
    ShardId='shardId-000000000000',
    ShardIteratorType='AT_TIMESTAMP',
    Timestamp=timestamp
)['ShardIterator']

Kafka wins on retention flexibility. You can retain data indefinitely (hardware permitting) with no incremental cost beyond storage. Kinesis charges per shard-hour for every hour beyond 24, which becomes significant at scale.

Cost Comparison at Scale

Running 100GB/day of event data for 30 days with 3 consumers:

Kafka (self-managed on AWS, 6-broker cluster):
  EC2 (6x r5.4xlarge):        $4,320/month
  EBS storage (30TB):           $750/month
  Data transfer:                $200/month
  Total:                      ~$5,270/month
  Ops cost (0.5 FTE):        ~$8,000/month
  ─────────────────────────────────────────
  Total with ops:            ~$13,270/month

Kinesis (100 shards, enhanced fan-out, 7-day retention):
  Shard-hours (100 × 720h):   $1,800/month
  PUT payload units:            $360/month
  Enhanced fan-out (3):       $1,800/month
  Extended retention (7d):      $360/month
  Total:                      ~$4,320/month
  Ops cost (0.1 FTE):        ~$1,600/month
  ─────────────────────────────────────────
  Total with ops:             ~$5,920/month

At 100GB/day with 3 consumers, Kinesis is meaningfully cheaper. The crossover happens around 500GB/day to 1TB/day depending on consumer count and retention requirements — at that point Kafka's flat infrastructure cost becomes competitive with Kinesis's per-shard pricing.

Decision Framework

def choose_streaming_platform(
    throughput_mb_per_sec,
    num_independent_consumers,
    has_dedicated_platform_team,
    already_on_aws,
    needs_custom_consumer_logic,
    retention_days_required
):
    score = {"kafka": 0, "kinesis": 0}
 
    if throughput_mb_per_sec > 500:
        score["kafka"] += 3         # Kinesis shard cost becomes prohibitive
    else:
        score["kinesis"] += 2
 
    if num_independent_consumers > 5:
        score["kafka"] += 2         # Independent offsets, no fan-out cost
    elif num_independent_consumers <= 2:
        score["kinesis"] += 1
 
    if not has_dedicated_platform_team:
        score["kinesis"] += 3       # Operational overhead is the biggest risk
    else:
        score["kafka"] += 1
 
    if already_on_aws and not needs_custom_consumer_logic:
        score["kinesis"] += 2       # Native Lambda, Firehose, Glue integration
 
    if retention_days_required > 7:
        score["kafka"] += 2         # Kinesis long-term retention gets expensive
 
    winner = max(score, key=score.get)
    print(f"Kafka: {score['kafka']}  Kinesis: {score['kinesis']}")
    print(f"Recommendation: {winner.upper()}")
    return winner

Real-World Impact

After running both systems across two separate products over 18 months:

Kafka peak throughput: 2.1M events/sec on 20-broker cluster
Kinesis peak throughput: 95,000 events/sec on 100 shards
Kafka ops incidents/month: 3–4 (broker restarts, partition rebalances, lag spikes)
Kinesis ops incidents/month: 0.5 (shard throttling, IAM misconfigs)
Kafka cost at 1TB/day: ~$13K/month (including ops)
Kinesis cost at 100GB/day: ~$6K/month (including ops)
Time to first message (new stream): Kafka 2–3 days, Kinesis 15 minutes

Key Takeaways

Kinesis wins on operational simplicity — if you don't have a platform team, this gap is decisive
Kafka wins on throughput ceiling and cost at high volume — above ~500MB/s Kinesis shard costs escalate fast
Both systems give equivalent per-key ordering guarantees — this is not a differentiator
Kinesis enhanced fan-out is non-negotiable with more than 2 consumers — budget for it upfront
Kafka's consumer group model is more flexible — independent consumers moving at different speeds is natural; in Kinesis it requires more design
The real cost of Kafka is the ops FTE, not the infrastructure — factor this in honestly

Neither platform is wrong. Kinesis is the right default for AWS-native teams without streaming expertise. Kafka is the right choice when throughput, consumer flexibility, or retention requirements outgrow what Kinesis can deliver economically.

Next Steps:

Read about debugging Kafka consumer lag in production
Explore scaling Spark to 100TB: production patterns
Check out the real-time streaming pipeline project