Set num.standby.replicas in the StreamsConfig to 1 or 2; Kafka Streams will maintain shadow copies of each state store on that many additional instances, continuously consuming the changelog topic
Ensure changelog topics use log compaction (default for Kafka Streams); confirm with kafka-topics.sh --describe that cleanup.policy=compact is set on the state store changelog topic
Tune RocksDB compaction via a custom RocksDBConfigSetter: reduce write_buffer_size and increase max_write_buffer_number if memory is constrained; increase block_cache_size for read-heavy aggregations
Set rocksdb.config.setter=com.example.MyRocksDBConfig in the Streams properties and implement the RocksDBConfigSetter interface to apply per-store tuning
Simulate a task migration: kill one Streams instance and confirm the standby on another instance transitions to active within seconds (check the task assignment log)
Monitor restore lag via the kafka_streams_state_store_restore_remaining_records JMX metric; alert if standby lag grows beyond an acceptable threshold
Known gotchas
Standby replicas consume changelog topic bandwidth proportionally to state store write rate; in write-heavy topologies, num.standby.replicas=2 can double changelog read throughput across the cluster
RocksDB stores state in a local directory specified by state.dir; on Kubernetes, this must be a persistent volume — ephemeral storage causes full state restore from changelog on every pod restart, negating standby benefit
Changelog topics inherit the topic-level replication.factor from the Streams application's replication.factor config, not the broker default; set it explicitly to avoid under-replicated state
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp