Set checkpointLocation in the writeStream options to a reliable, durable path (HDFS, S3, ADLS) before starting the stream
Use a sink that supports idempotent writes or two-phase commit (e.g., Delta Lake, Kafka with transactions) to achieve end-to-end exactly-once semantics
Stop and restart the streaming query without changing the checkpointLocation; verify in the logs that the query resumes from the last committed offset
Simulate a failure by killing the query mid-batch and restarting; confirm that no duplicate records appear in the output and no records are skipped
Validate that changing the query's transformations (e.g., adding a column) is compatible with the existing checkpoint; incompatible changes require a fresh checkpoint and potential data replay
Known gotchas
Changing the query schema or certain operations (e.g., adding a stateful operation) after a checkpoint is written makes the checkpoint incompatible; the stream must be restarted from scratch with a new checkpoint location, risking data loss or duplication during the transition
Exactly-once is only achievable end-to-end if the sink supports idempotent writes or transactional commits; a non-idempotent sink (e.g., plain file append) degrades exactly-once to at-least-once even with a valid checkpoint
Object store checkpoints (S3, GCS) have eventual consistency on older deployments; use stores with strong read-after-write consistency or configure the stream to use HDFS/DFS for checkpoints in latency-sensitive pipelines
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp