Steps

Set checkpointLocation in the writeStream options to a reliable, durable path (HDFS, S3, ADLS) before starting the stream
Use a sink that supports idempotent writes or two-phase commit (e.g., Delta Lake, Kafka with transactions) to achieve end-to-end exactly-once semantics
Stop and restart the streaming query without changing the checkpointLocation; verify in the logs that the query resumes from the last committed offset
Simulate a failure by killing the query mid-batch and restarting; confirm that no duplicate records appear in the output and no records are skipped
Validate that changing the query's transformations (e.g., adding a column) is compatible with the existing checkpoint; incompatible changes require a fresh checkpoint and potential data replay

Known gotchas

Changing the query schema or certain operations (e.g., adding a stateful operation) after a checkpoint is written makes the checkpoint incompatible; the stream must be restarted from scratch with a new checkpoint location, risking data loss or duplication during the transition
Exactly-once is only achievable end-to-end if the sink supports idempotent writes or transactional commits; a non-idempotent sink (e.g., plain file append) degrades exactly-once to at-least-once even with a valid checkpoint
Object store checkpoints (S3, GCS) have eventual consistency on older deployments; use stores with strong read-after-write consistency or configure the stream to use HDFS/DFS for checkpoints in latency-sensitive pipelines

data-engineering · 5 steps · unrated

Configure Spark Structured Streaming trigger modes (processingTime, availableNow, continuous)

data-engineering · 5 steps · unrated

Configure Spark Structured Streaming watermarking to handle late-arriving data and bound state size

spark.apache.org · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Configure Spark Structured Streaming checkpoint recovery and exactly-once processing guarantees

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?