Steps

Set num.standby.replicas in the StreamsConfig to 1 or 2; Kafka Streams will maintain shadow copies of each state store on that many additional instances, continuously consuming the changelog topic
Ensure changelog topics use log compaction (default for Kafka Streams); confirm with kafka-topics.sh --describe that cleanup.policy=compact is set on the state store changelog topic
Tune RocksDB compaction via a custom RocksDBConfigSetter: reduce write_buffer_size and increase max_write_buffer_number if memory is constrained; increase block_cache_size for read-heavy aggregations
Set rocksdb.config.setter=com.example.MyRocksDBConfig in the Streams properties and implement the RocksDBConfigSetter interface to apply per-store tuning
Simulate a task migration: kill one Streams instance and confirm the standby on another instance transitions to active within seconds (check the task assignment log)
Monitor restore lag via the kafka_streams_state_store_restore_remaining_records JMX metric; alert if standby lag grows beyond an acceptable threshold

Known gotchas

Standby replicas consume changelog topic bandwidth proportionally to state store write rate; in write-heavy topologies, num.standby.replicas=2 can double changelog read throughput across the cluster
RocksDB stores state in a local directory specified by state.dir; on Kubernetes, this must be a persistent volume — ephemeral storage causes full state restore from changelog on every pod restart, negating standby benefit
Changelog topics inherit the topic-level replication.factor from the Streams application's replication.factor config, not the broker default; set it explicitly to avoid under-replicated state

Configure RocksDB state backend in Flink with incremental checkpoints for large stateful streaming applications

nightlies.apache.org/flink · 6 steps · unrated

Configure Kafka Streams to handle topology changes between versions using a state store migration and changelog topic rebuild strategy

kafka.apache.org · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Tune Kafka Streams standby replicas and RocksDB changelog compaction for fast task failover

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?