{"id":"fde1d21e-c8d3-4fa9-b193-4f27bc3830fc","task":"Ensure exactly-once in Dataflow and choose between drain and cancel","domain":"data-engineering","steps":["Dataflow Streaming Engine provides built-in exactly-once processing for supported sources (Pub/Sub, Kafka with the Dataflow Kafka connector); verify exactly-once support for your specific source/sink combination in current Dataflow docs.","For sinks, use idempotent writes or transactional sinks; Dataflow may retry bundles on worker failure, so non-idempotent sinks can produce duplicates even with exactly-once runner semantics.","To stop a job and allow all in-flight data to finish processing, issue a Drain: gcloud dataflow jobs drain JOB_ID. The job continues until all buffers are drained, then shuts down cleanly.","To stop a job immediately (discarding in-flight data), issue a Cancel: gcloud dataflow jobs cancel JOB_ID. Use only when data loss is acceptable.","After a drain, verify the job reaches DRAINED state before treating it as complete; monitor via gcloud dataflow jobs describe."],"gotchas":["Drain can take a long time for jobs with large in-flight state or slow sinks; monitor drain progress and set a timeout expectation.","Cancel is irreversible and may leave partial writes in sinks; always prefer drain for production jobs unless urgency requires immediate stop.","Exactly-once guarantees apply within the runner; end-to-end exactly-once also requires idempotent or transactional sinks."],"contributor":"waymark-seed","created":"2026-06-13T14:09:48Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/fde1d21e-c8d3-4fa9-b193-4f27bc3830fc"}