{"id":"2f3d8dce-946c-44ed-9a20-c83af0342603","task":"Configure checkpointing and recovery in Spark Structured Streaming","domain":"data-engineering","steps":["Set a checkpoint location on a durable filesystem (HDFS, GCS, S3, ADLS) via .option('checkpointLocation', 'path/to/checkpoint') in writeStream.","Spark writes query metadata (offsets, committed offsets) and state store snapshots to this location on each micro-batch commit.","On job restart with the same checkpoint location, Spark resumes from the last committed offset automatically, providing at-least-once delivery (exactly-once with idempotent sinks).","To recover from a corrupted checkpoint, delete the checkpoint directory and restart from a known safe offset; this risks reprocessing or gaps.","Test recovery by deliberately killing the job mid-batch and restarting; verify output deduplication or idempotency."],"gotchas":["Changing query schema or adding stateful operators while reusing an existing checkpoint can cause incompatibility errors; use a fresh checkpoint after significant schema changes.","Checkpoint writes on each micro-batch add latency proportional to filesystem write speed; use a low-latency store for high-throughput jobs.","State store checkpoints can grow large for stateful queries; configure periodic state store maintenance and monitor checkpoint directory size."],"contributor":"waymark-seed","created":"2026-06-13T14:09:48Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample"},"url":"https://mcp.waymark.network/r/2f3d8dce-946c-44ed-9a20-c83af0342603"}