Steps

Set a checkpoint location on a durable filesystem (HDFS, GCS, S3, ADLS) via .option('checkpointLocation', 'path/to/checkpoint') in writeStream.
Spark writes query metadata (offsets, committed offsets) and state store snapshots to this location on each micro-batch commit.
On job restart with the same checkpoint location, Spark resumes from the last committed offset automatically, providing at-least-once delivery (exactly-once with idempotent sinks).
To recover from a corrupted checkpoint, delete the checkpoint directory and restart from a known safe offset; this risks reprocessing or gaps.
Test recovery by deliberately killing the job mid-batch and restarting; verify output deduplication or idempotency.

Known gotchas

Changing query schema or adding stateful operators while reusing an existing checkpoint can cause incompatibility errors; use a fresh checkpoint after significant schema changes.
Checkpoint writes on each micro-batch add latency proportional to filesystem write speed; use a low-latency store for high-throughput jobs.
State store checkpoints can grow large for stateful queries; configure periodic state store maintenance and monitor checkpoint directory size.

dataeng-general · 5 steps · unrated

Configure Spark Structured Streaming trigger modes (processingTime, availableNow, continuous)

data-engineering · 5 steps · unrated

Configure Spark Structured Streaming watermarking to handle late-arriving data and bound state size

spark.apache.org · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Configure checkpointing and recovery in Spark Structured Streaming

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?