Back to Blog
kafkakinesisawsstreamingarchitecturedata-engineering

Kafka vs Kinesis: Architectural Trade-offs

2024-09-1110 min read

Kafka vs Kinesis: Architectural Trade-offs

Every data engineering team hits this decision eventually. Both Kafka and Kinesis move event streams reliably at scale — but they make fundamentally different trade-offs around operational complexity, throughput ceilings, consumer flexibility, and cost. After running Kafka clusters processing 2 million events/second and Kinesis pipelines ingesting 400GB/hour on separate products, here is what actually matters when choosing between them.

The Decision Frame

There is no universally correct answer. The right choice depends on three questions:

  • Who operates it? Kafka is infrastructure you own. Kinesis is infrastructure AWS owns.
  • What are your throughput and latency requirements? The gap between them is real and measurable.
  • How many consumers do you have, and how independently do they need to move?

Everything else — pricing, ordering guarantees, replay windows — flows from those three constraints.


Architecture Overview

Apache Kafka

Kafka is a distributed commit log. Producers write to topics partitioned across a broker cluster. Consumers read from partitions independently, tracking their own offsets. The cluster state is managed by ZooKeeper (pre-3.0) or KRaft (3.0+).

Producers → Topics (N partitions) → Broker Cluster (M brokers)
                                           ↓
                              Consumer Groups (independent offsets)
                              ├── Group A: real-time analytics
                              ├── Group B: data lake ingestion
                              └── Group C: ML feature pipeline
# Kafka producer — explicit partition control
from confluent_kafka import Producer
 
producer = Producer({
    'bootstrap.servers': 'kafka-broker-1:9092,kafka-broker-2:9092',
    'acks': 'all',                      # Wait for all ISR replicas
    'retries': 5,
    'linger.ms': 10,                    # Batch for 10ms before sending
    'batch.size': 65536,                # 64KB batch size
    'compression.type': 'lz4',
})
 
def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
 
producer.produce(
    topic='user-events',
    key=user_id.encode(),               # Key determines partition
    value=event_payload,
    callback=delivery_callback
)
producer.flush()

AWS Kinesis Data Streams

Kinesis is a managed shard-based stream. Producers write records to a stream divided into shards. Each shard handles 1MB/s write and 2MB/s read. AWS manages the infrastructure entirely — no brokers to provision, no ZooKeeper to babysit.

Producers → Stream (N shards) → Kinesis Service (AWS-managed)
                                        ↓
                           Consumers (shared 2MB/s read per shard)
                           ├── Lambda trigger
                           ├── KCL application
                           └── Kinesis Firehose → S3
# Kinesis producer — partition key hashes to shard
import boto3
import json
 
kinesis = boto3.client('kinesis', region_name='us-east-1')
 
response = kinesis.put_record(
    StreamName='user-events',
    Data=json.dumps(event_payload).encode(),
    PartitionKey=user_id,               # Hashed to determine shard
)
 
print(f"Shard: {response['ShardId']}, Seq: {response['SequenceNumber']}")
 
# Batch writes — up to 500 records, 5MB total
records = [
    {'Data': json.dumps(e).encode(), 'PartitionKey': e['user_id']}
    for e in event_batch
]
kinesis.put_records(StreamName='user-events', Records=records)

Throughput: The Real Numbers

Kafka Throughput

Kafka throughput scales horizontally with partition count and broker count. There is no hard ceiling imposed by the service — only your hardware and network.

# Kafka throughput estimation
def kafka_throughput(num_brokers, disk_throughput_mb_per_broker,
                     replication_factor, num_partitions):
    # Write throughput limited by replication
    write_throughput = (num_brokers * disk_throughput_mb_per_broker) / replication_factor
    # Read throughput not limited by replication
    read_throughput = num_brokers * disk_throughput_mb_per_broker
    parallelism = num_partitions
 
    print(f"Write throughput: ~{write_throughput:.0f} MB/s")
    print(f"Read throughput:  ~{read_throughput:.0f} MB/s")
    print(f"Max parallelism:  {parallelism} concurrent consumers")
 
# 10-broker cluster, 500MB/s disk each, RF=3, 120 partitions
kafka_throughput(10, 500, 3, 120)
# Write throughput: ~1666 MB/s
# Read throughput:  ~5000 MB/s
# Max parallelism:  120 concurrent consumers

Kinesis Throughput

Kinesis throughput is strictly shard-bounded. Each shard: 1MB/s or 1,000 records/s write, 2MB/s or 5 reads/s read. These are hard limits enforced by the service.

# Kinesis throughput estimation
def kinesis_throughput(num_shards, consumers_sharing_read):
    write_mb_per_sec = num_shards * 1        # 1MB/s per shard
    write_records_per_sec = num_shards * 1000
    read_mb_per_sec = (num_shards * 2) / consumers_sharing_read  # Shared 2MB/s
    read_calls_per_sec = (num_shards * 5) / consumers_sharing_read
 
    print(f"Write:      {write_mb_per_sec} MB/s, {write_records_per_sec:,} rec/s")
    print(f"Read/consumer: {read_mb_per_sec:.1f} MB/s, {read_calls_per_sec:.1f} calls/s")
    print(f"Note: Enhanced fan-out gives each consumer dedicated 2MB/s per shard")
 
# 100 shards, 3 consumers sharing standard reads
kinesis_throughput(100, 3)
# Write:         100 MB/s, 100,000 rec/s
# Read/consumer: 66.7 MB/s, 1.7 calls/s
# Note: Enhanced fan-out gives each consumer dedicated 2MB/s per shard

Enhanced fan-out solves the shared read limit — each registered consumer gets a dedicated 2MB/s per shard push connection. It costs more but eliminates read throttling with multiple consumers.

# Register enhanced fan-out consumer
kinesis.register_stream_consumer(
    StreamARN='arn:aws:kinesis:us-east-1:123456789:stream/user-events',
    ConsumerName='real-time-analytics'
)
# Each registered consumer gets 2MB/s per shard — independent of others

Impact: At 2 million events/second, Kafka handled it with a 20-broker cluster and 240 partitions. Kinesis would require 2,000 shards — a $52,000/month shard cost before data transfer.


Ordering Guarantees

Kafka: Partition-Level Ordering

Kafka guarantees strict ordering within a partition. Messages with the same key always go to the same partition, so per-key ordering is guaranteed across the entire topic.

# Kafka: all events for user_123 land on the same partition in order
producer.produce(
    topic='user-events',
    key='user_123',         # Same key → same partition → ordered
    value=event_a
)
producer.produce(
    topic='user-events',
    key='user_123',
    value=event_b           # Always after event_a for this key
)
# Global ordering across partitions: not guaranteed
# If you need global order, use a single partition — but lose parallelism
producer.produce(
    topic='ordered-events',
    partition=0,            # Force all writes to one partition
    value=event
)
# Throughput ceiling: ~100MB/s; parallelism: 1 consumer

Kinesis: Shard-Level Ordering

Kinesis guarantees ordering within a shard. The same partition key always maps to the same shard, giving per-key ordering — identical to Kafka's model.

# Kinesis: same partition key → same shard → ordered
kinesis.put_record(
    StreamName='user-events',
    Data=event_a,
    PartitionKey='user_123'     # Hashed to shard; ordered within shard
)
kinesis.put_record(
    StreamName='user-events',
    Data=event_b,
    PartitionKey='user_123'     # After event_a on the same shard
)

The ordering model is equivalent. Both systems give you per-key ordering within a partition/shard. Global ordering requires a single partition/shard in both systems, and destroys parallelism in both.


Consumer Model: The Biggest Practical Difference

This is where the systems diverge most meaningfully in day-to-day operations.

Kafka: Fully Independent Consumer Groups

Each consumer group maintains its own offset independently. Adding a new consumer group costs nothing — it reads the topic from any point without affecting other consumers.

# Three completely independent consumers of the same topic
# Each maintains its own offset, moves at its own pace
 
# Consumer Group A: real-time, at the tip of the log
consumer_a = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'realtime-analytics',
    'auto.offset.reset': 'latest'
})
 
# Consumer Group B: batch, running 6 hours behind
consumer_b = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'data-lake-ingestion',
    'auto.offset.reset': 'earliest'
})
 
# Consumer Group C: reprocessing historical data from 30 days ago
consumer_c = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'ml-backfill',
})
consumer_c.seek(partition, offset_from_30_days_ago)
 
# All three consume without awareness of each other

Kinesis: Shared Shard Read Throughput (Standard) or Enhanced Fan-Out

Standard Kinesis consumers share the 2MB/s read limit per shard. Adding consumers degrades everyone's read throughput until you enable enhanced fan-out.

# Standard consumer — shares 2MB/s with all others on this shard
import boto3
 
kinesis = boto3.client('kinesis')
 
shard_iterator = kinesis.get_shard_iterator(
    StreamName='user-events',
    ShardId='shardId-000000000000',
    ShardIteratorType='LATEST'
)['ShardIterator']
 
while True:
    response = kinesis.get_records(ShardIterator=shard_iterator, Limit=1000)
    records = response['Records']
    shard_iterator = response['NextShardIterator']
    # This consumer is consuming from the shared 2MB/s budget
    # A third consumer on this shard means each gets ~0.67MB/s
# Enhanced fan-out — dedicated 2MB/s per shard, push-based
import asyncio
from aiokinesisreader import KinesisReader   # Example async KCL wrapper
 
async def consume_with_fan_out():
    reader = KinesisReader(
        stream_name='user-events',
        consumer_arn='arn:aws:kinesis:...:consumer/realtime-analytics',
        # Each registered consumer gets its own 2MB/s — no sharing
    )
    async for record in reader:
        process(record)

Rule: If you have more than 2 consumers on a Kinesis stream, enable enhanced fan-out. Standard reads at scale are a latency and throttling trap.


Operational Overhead: The Honest Comparison

Kafka Self-Managed

# What you own with self-managed Kafka
# Broker provisioning and scaling
kafka-topics.sh --create --topic user-events \
  --partitions 120 --replication-factor 3 \
  --bootstrap-server kafka:9092
 
# Monitoring: broker JVM, disk usage, ISR shrinks, under-replicated partitions
# Alerting on: consumer lag, leader elections, network saturation
# Upgrades: rolling restarts across broker cluster
# Capacity planning: disk growth, partition reassignment as brokers are added
 
# Reassign partitions when adding brokers
kafka-reassign-partitions.sh \
  --bootstrap-server kafka:9092 \
  --reassignment-json-file reassignment.json \
  --execute

Running self-managed Kafka seriously requires: a dedicated ops rotation, expertise in JVM tuning, ZooKeeper or KRaft operational knowledge, and a monitoring stack covering 30+ broker metrics. Budget 0.5–1 FTE of ops time per Kafka cluster.

Kinesis Managed

# What you own with Kinesis
# Shard management — scaling up and down
kinesis.update_shard_count(
    StreamName='user-events',
    TargetShardCount=200,           # Scale up
    ScalingType='UNIFORM_SCALING'
)
# Note: shard splits/merges take ~30 seconds; there is no auto-scaling by default
 
# Monitor: PutRecords.ThrottledRecords, GetRecords.IteratorAgeMilliseconds,
#          ReadProvisionedThroughputExceeded, WriteProvisionedThroughputExceeded
 
import boto3
cloudwatch = boto3.client('cloudwatch')
 
cloudwatch.put_metric_alarm(
    AlarmName='kinesis-write-throttle',
    MetricName='WriteProvisionedThroughputExceeded',
    Namespace='AWS/Kinesis',
    Statistic='Sum',
    Period=60,
    EvaluationPeriods=3,
    Threshold=100,
    ComparisonOperator='GreaterThanThreshold',
    Dimensions=[{'Name': 'StreamName', 'Value': 'user-events'}]
)

Kinesis operational burden is significantly lower — no brokers, no ZooKeeper, no rolling restarts. The ops surface is shard count management, IAM permissions, and CloudWatch alarms. Budget 0.1 FTE ops time.

Impact: Migrating a 3-stream Kinesis setup to self-managed Kafka added one full-time platform engineer to our team. The throughput gains justified it at our scale. At lower scale, they would not have.


Retention and Replay

Kafka

# Kafka: configure retention per topic
# Time-based (default 7 days, configurable to years)
kafka-configs.sh --bootstrap-server kafka:9092 \
  --entity-type topics --entity-name user-events \
  --alter --add-config retention.ms=2592000000  # 30 days
 
# Size-based retention
kafka-configs.sh --bootstrap-server kafka:9092 \
  --entity-type topics --entity-name user-events \
  --alter --add-config retention.bytes=1099511627776  # 1TB per partition
 
# Infinite retention with Tiered Storage (Kafka 3.6+)
# Hot data on broker disks, cold data on S3 — transparent to consumers

Kinesis

# Kinesis: 24-hour default retention, up to 365 days
kinesis.increase_stream_retention_period(
    StreamName='user-events',
    RetentionPeriodHours=168    # 7 days — costs extra beyond 24 hours
)
 
# Extended retention pricing:
# 24 hours: included in shard-hour cost
# 7 days:   +$0.020 per shard-hour
# 365 days: +$0.023 per shard-hour (Long-term retention)
 
# Replay from a specific timestamp
import datetime
 
timestamp = datetime.datetime(2024, 9, 1, 0, 0, 0)
shard_iterator = kinesis.get_shard_iterator(
    StreamName='user-events',
    ShardId='shardId-000000000000',
    ShardIteratorType='AT_TIMESTAMP',
    Timestamp=timestamp
)['ShardIterator']

Kafka wins on retention flexibility. You can retain data indefinitely (hardware permitting) with no incremental cost beyond storage. Kinesis charges per shard-hour for every hour beyond 24, which becomes significant at scale.


Cost Comparison at Scale

Running 100GB/day of event data for 30 days with 3 consumers:

Kafka (self-managed on AWS, 6-broker cluster):
  EC2 (6x r5.4xlarge):        $4,320/month
  EBS storage (30TB):           $750/month
  Data transfer:                $200/month
  Total:                      ~$5,270/month
  Ops cost (0.5 FTE):        ~$8,000/month
  ─────────────────────────────────────────
  Total with ops:            ~$13,270/month

Kinesis (100 shards, enhanced fan-out, 7-day retention):
  Shard-hours (100 × 720h):   $1,800/month
  PUT payload units:            $360/month
  Enhanced fan-out (3):       $1,800/month
  Extended retention (7d):      $360/month
  Total:                      ~$4,320/month
  Ops cost (0.1 FTE):        ~$1,600/month
  ─────────────────────────────────────────
  Total with ops:             ~$5,920/month

At 100GB/day with 3 consumers, Kinesis is meaningfully cheaper. The crossover happens around 500GB/day to 1TB/day depending on consumer count and retention requirements — at that point Kafka's flat infrastructure cost becomes competitive with Kinesis's per-shard pricing.


Decision Framework

def choose_streaming_platform(
    throughput_mb_per_sec,
    num_independent_consumers,
    has_dedicated_platform_team,
    already_on_aws,
    needs_custom_consumer_logic,
    retention_days_required
):
    score = {"kafka": 0, "kinesis": 0}
 
    if throughput_mb_per_sec > 500:
        score["kafka"] += 3         # Kinesis shard cost becomes prohibitive
    else:
        score["kinesis"] += 2
 
    if num_independent_consumers > 5:
        score["kafka"] += 2         # Independent offsets, no fan-out cost
    elif num_independent_consumers <= 2:
        score["kinesis"] += 1
 
    if not has_dedicated_platform_team:
        score["kinesis"] += 3       # Operational overhead is the biggest risk
    else:
        score["kafka"] += 1
 
    if already_on_aws and not needs_custom_consumer_logic:
        score["kinesis"] += 2       # Native Lambda, Firehose, Glue integration
 
    if retention_days_required > 7:
        score["kafka"] += 2         # Kinesis long-term retention gets expensive
 
    winner = max(score, key=score.get)
    print(f"Kafka: {score['kafka']}  Kinesis: {score['kinesis']}")
    print(f"Recommendation: {winner.upper()}")
    return winner

Real-World Impact

After running both systems across two separate products over 18 months:

  • Kafka peak throughput: 2.1M events/sec on 20-broker cluster
  • Kinesis peak throughput: 95,000 events/sec on 100 shards
  • Kafka ops incidents/month: 3–4 (broker restarts, partition rebalances, lag spikes)
  • Kinesis ops incidents/month: 0.5 (shard throttling, IAM misconfigs)
  • Kafka cost at 1TB/day: ~$13K/month (including ops)
  • Kinesis cost at 100GB/day: ~$6K/month (including ops)
  • Time to first message (new stream): Kafka 2–3 days, Kinesis 15 minutes

Key Takeaways

  1. Kinesis wins on operational simplicity — if you don't have a platform team, this gap is decisive
  2. Kafka wins on throughput ceiling and cost at high volume — above ~500MB/s Kinesis shard costs escalate fast
  3. Both systems give equivalent per-key ordering guarantees — this is not a differentiator
  4. Kinesis enhanced fan-out is non-negotiable with more than 2 consumers — budget for it upfront
  5. Kafka's consumer group model is more flexible — independent consumers moving at different speeds is natural; in Kinesis it requires more design
  6. The real cost of Kafka is the ops FTE, not the infrastructure — factor this in honestly

Neither platform is wrong. Kinesis is the right default for AWS-native teams without streaming expertise. Kafka is the right choice when throughput, consumer flexibility, or retention requirements outgrow what Kinesis can deliver economically.


Next Steps: