Steps

Enable checkpointing in the StreamExecutionEnvironment: set a checkpoint interval appropriate for your latency/durability tradeoff, set CheckpointingMode.EXACTLY_ONCE, and configure a state backend (RocksDB for large state, heap for small).
Point checkpoint storage to a durable remote store (HDFS, S3, GCS) by configuring the checkpoint directory; local storage is lost on task manager failure.
Set minimum pause between checkpoints and checkpoint timeout to prevent checkpoint storms; if a checkpoint takes longer than the timeout, Flink aborts it and retries.
Use a sink that implements the TwoPhaseCommitSinkFunction (or the new Sink API with a Committer) to integrate exactly-once guarantees with transactional targets such as Kafka, JDBC, or Iceberg.
Configure max concurrent checkpoints to 1 during normal operation to reduce state backend contention; increase only if the checkpoint interval is much longer than individual checkpoint duration.
Enable unaligned checkpoints if your pipeline has long-running barriers due to backpressure, but verify that your sink's pre-commit phase can tolerate the resulting ordering semantics.

Known gotchas

Exactly-once with a two-phase commit sink means a checkpoint failure will cause a rollback to the last completed checkpoint; the sink will re-emit records between the failed and last-successful checkpoint, so the sink backend must handle idempotent re-delivery.
RocksDB incremental checkpoints reduce checkpoint size but require the full checkpoint history chain to restore; losing intermediate checkpoints invalidates the restore path.
Savepoints are not automatic; you must trigger them manually or via the REST API before upgrades—regular checkpoints alone do not provide a stable restore point for application-level changes.

nightlies.apache.org/flink · 6 steps · unrated

Configure Flink state backend with RocksDB and incremental checkpointing for large stateful jobs

dataeng-general · 5 steps · unrated

Enable Flink buffer debloating to reduce checkpoint alignment time under backpressure

data-engineering · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Configure Flink checkpointing and exactly-once sinks for durable stateful streaming pipelines

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?