Kafka Best Practices
Published:
Below are some notes on Kafka best practices that I found useful while working with Kafka in production.
1. Choosing topic/Partitions
- Topic/Partition is unit of parallelism in Kafka
- More Partitions = more parallelism of consumers -> More throughput
Consideration
More partitions can increase the end-to-end latency (time from producer to consumer read)
- Messages are exposed to consumers only after being committed (replicated to all in-sync replicas)
- Replicating 1000 partitions can take 20ms, potentially too high for real-time applications.
Kafka Producer Buffering mechanism:
- Messages are buffered per partition and sent to the broker after enough accumulation or time.
- more partitions = messages will be accumulated for more partitions on producer side.
- Each partition has its own buffer, and messages are stored in batches within these buffers
- More partitions = producer needs to maintain more buffers
Kafka Consumer:
- fetches batch of messages per partitions. The more partitions that consumer is subscribing to, the more memory it needs.
2. Factors Affecting Performance
- RAM - Main memory. More specifically File system buffer cache.
- Multiple dedicated disks.
- Partitions per topic. More partitions allows increased parallelism.
- Ethernet bandwidth.
3. Kafka Broker configs
log.retention.hours: Controls when the old messages in a topic will be deleted.message.max.bytes: Maximum message size the server can accept. Ensurereplica.fetch.max.bytesis set to be >= this value.delete.topic.enable: Allows users to delete a topic from Kafka. Disabled by default, functional from Kafka 0.9 onwards.unclean.leader.election: true by default, prioritizing availability over durability.- If set to true, a replica can become the leader without it being a ISR if the original leader broker goes down, risking data loss.
- Set to false for higher durability to prevent unclean leader elections.
4. Kafka producer
Critical configs
- Batch.size (size based batching)
- Linger.ms ( time based batching)
- Compression.type
- Max.in.flight.requests.per.connection (affects ordering)
- Acks ( affects durability)
Performance notes
Partition Targeting:
- A producer thread targeting a single partition performs faster than one sending messages to multiple partitions.
Flush Optimization:
- The new Producer API’s flush() call can improve performance if used correctly.
- Optimal performance is achieved with around 4MB of data between flush() calls (based on 1KB event size).
- Rule of thumb for batch size:
batch.size = total bytes between flush() / partition count.
Scaling Producers:
- If producer throughput is maximized but there is spare CPU and network capacity, add more producer processes.
Event Size Sensitivity:
- Larger event sizes (e.g., 1KB) generally yield better throughput than smaller events (e.g., 100 bytes).
Linger.ms Impact:
- No definitive rule; its impact varies with use case. For small events (100 bytes or less), it generally has minimal effect.
Lifecycle of a request from Producer to Broker
Batch Polling: Polls a batch from the batch queue, one batch per partition.
Batch Grouping: Groups batches based on the leader broker then sends the grouped batches to the brokers.
Pipelining: If max.in.flight.requests.per.connection > 1, requests are pipelined.
Batch Readiness: A batch is ready to send when:
- Batch.size is reached.
- Linger.ms is reached.
- Another batch to the same broker is ready.
- flush() or close() is called.
Big Batching:
- Results in better compression ratio and higher throughput.
- Leads to higher latency.
Compression.type
- Compression is in user thread, so adding more threads helps with the throughput if compression is slow
ACKs
- Defines durability level for producer.
Max.in.flight.requests.per.connection
- Max.in.flight.requests.per.connection > 1 means pipelining.
- Gives better throughput
- May cause out of order delivery when retry occurs
- Excessive pipelining , drops throughput
5. Kafka consumer
- Rule of thumb:
- Number of consumer threads = Partition count
- Microbenchmarking showed that
Consumer performance was not as sensitive to event size or batch size as compared to Producer. Both 1kb and 100byte events showed similar throughput.

Leave a Comment