Define a function with signature (batchDF: DataFrame, batchId: Long) => Unit (Scala/Python equivalent) that processes each micro-batch as a static DataFrame.
Register the function with writeStream.foreachBatch(myFunc).start().
Inside the function, use batchId to implement idempotent writes (e.g., skip or overwrite if batchId already processed) for exactly-once semantics.
You can write to multiple sinks in one function call, apply arbitrary DataFrame transformations, or call external APIs.
Cache the batchDF if you materialize it more than once inside the function to avoid recomputation.
Known gotchas
foreachBatch processes each micro-batch exactly once from Spark's perspective, but the function may be retried on failure; implement idempotency using batchId.
The batchDF is a bounded DataFrame; avoid calling streaming-only operations inside the function.
Long-running foreachBatch functions block the next micro-batch trigger; keep processing fast or increase the trigger interval.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp