Trigger a savepoint on a running job using the Flink CLI: flink savepoint <job-id> s3://bucket/savepoints/ or via the REST API POST /jobs/{jobId}/savepoints
Assign explicit uid() strings to all stateful operators in the job graph; Flink uses these UIDs to match state between the savepoint and the new job topology
Modify the job (schema changes, operator additions, topology refactoring) while keeping the same operator UIDs for operators whose state must be preserved
Restart the updated job from the savepoint: flink run --fromSavepoint s3://bucket/savepoints/savepoint-abc123 my-job.jar
For operators removed in the new topology, use --allowNonRestoredState flag to skip orphaned state rather than failing on restore
Validate the restored job by checking operator metrics and output correctness before canceling the old job or decommissioning the savepoint
Known gotchas
Changing the serialization format of a state type (e.g., upgrading a POJO) without registering a state migration or using a compatible serializer causes a deserialization failure on restore
Savepoints are not checkpoints — they are not automatically managed or expired; you must manually delete old savepoints to reclaim storage
Operators without explicit uid() assignments get auto-generated UIDs based on their position in the topology; any structural change (reordering, insertion) shifts those UIDs and breaks savepoint restore
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp