Steps

Define a function with signature (batchDF: DataFrame, batchId: Long) => Unit (Scala/Python equivalent) that processes each micro-batch as a static DataFrame.
Register the function with writeStream.foreachBatch(myFunc).start().
Inside the function, use batchId to implement idempotent writes (e.g., skip or overwrite if batchId already processed) for exactly-once semantics.
You can write to multiple sinks in one function call, apply arbitrary DataFrame transformations, or call external APIs.
Cache the batchDF if you materialize it more than once inside the function to avoid recomputation.

Known gotchas

foreachBatch processes each micro-batch exactly once from Spark's perspective, but the function may be retried on failure; implement idempotency using batchId.
The batchDF is a bounded DataFrame; avoid calling streaming-only operations inside the function.
Long-running foreachBatch functions block the next micro-batch trigger; keep processing fast or increase the trigger interval.

data-engineering · 5 steps · unrated

Implement stream-stream join with watermark in Spark Structured Streaming

data-engineering · 5 steps · unrated

Choose and apply Spark Structured Streaming output modes (append, update, complete)

data-engineering · 5 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Use foreachBatch sink in Spark Structured Streaming

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?