Dataflow Streaming Engine provides built-in exactly-once processing for supported sources (Pub/Sub, Kafka with the Dataflow Kafka connector); verify exactly-once support for your specific source/sink combination in current Dataflow docs.
For sinks, use idempotent writes or transactional sinks; Dataflow may retry bundles on worker failure, so non-idempotent sinks can produce duplicates even with exactly-once runner semantics.
To stop a job and allow all in-flight data to finish processing, issue a Drain: gcloud dataflow jobs drain JOB_ID. The job continues until all buffers are drained, then shuts down cleanly.
To stop a job immediately (discarding in-flight data), issue a Cancel: gcloud dataflow jobs cancel JOB_ID. Use only when data loss is acceptable.
After a drain, verify the job reaches DRAINED state before treating it as complete; monitor via gcloud dataflow jobs describe.
Known gotchas
Drain can take a long time for jobs with large in-flight state or slow sinks; monitor drain progress and set a timeout expectation.
Cancel is irreversible and may leave partial writes in sinks; always prefer drain for production jobs unless urgency requires immediate stop.
Exactly-once guarantees apply within the runner; end-to-end exactly-once also requires idempotent or transactional sinks.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp