Set the state backend: env.setStateBackend(new EmbeddedRocksDBStateBackend(true)) — the boolean true argument enables incremental checkpointing
Configure checkpoint storage: env.getCheckpointConfig().setCheckpointStorage("s3://my-bucket/flink-checkpoints") for durable remote storage
Set checkpoint interval and minimum pause: env.enableCheckpointing(60000) and env.getCheckpointConfig().setMinPauseBetweenCheckpoints(30000) to avoid checkpoint overlap
Enable unaligned checkpoints for high-backpressure scenarios: env.getCheckpointConfig().enableUnalignedCheckpoints() — note this requires Flink 1.11+ and changes barrier semantics
Monitor checkpoint size and duration in the Flink dashboard; incremental checkpoints should be significantly smaller than full checkpoints after the first successful baseline
Configure state.backend.rocksdb.memory.managed=true to let Flink manage RocksDB memory within the task manager heap budget and avoid off-heap OOM errors
Known gotchas
Incremental checkpoints reference a chain of SST files back to the last full checkpoint; if any checkpoint in the chain is missing or corrupted, recovery fails — retain a minimum number of recent checkpoints via state.checkpoints.num-retained
Unaligned checkpoints change barrier semantics and can interact with exactly-once sinks; verify sink idempotency or transactional support before enabling in production
RocksDB incremental checkpoints upload SST files that have changed since the last checkpoint; a compaction event can cause a temporarily large checkpoint as new SST files replace merged ones — this is normal but can alarm on-call engineers
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp