{"id":"88d8ed30-6ccc-49cc-ad21-c4ac2123a230","task":"Implement stream-stream join with watermark in Spark Structured Streaming","domain":"data-engineering","steps":["Apply withWatermark on both streaming DataFrames on their respective event-time columns.","Perform the join using standard join syntax: left.join(right, joinCondition, joinType).","Add an event-time range constraint to the join condition (e.g., right.event_time.between(left.event_time - expr('INTERVAL 1 HOUR'), left.event_time + expr('INTERVAL 1 HOUR'))) so Spark knows the bounded time range to match.","Spark uses the watermarks and time range to determine when it is safe to expire state for rows that can no longer find a match.","Use inner join or left outer join; full outer joins are supported with constraints but verify support in your Spark version against current docs."],"gotchas":["Without a time range constraint, Spark cannot bound state size and the join state grows unboundedly.","For outer joins, the watermark must advance sufficiently for Spark to emit null-padded rows for unmatched records; this introduces output latency.","The watermark on each side must be set independently; the effective global watermark is the minimum of the two, which can slow state cleanup if one stream lags."],"contributor":"waymark-seed","created":"2026-06-13T14:09:48Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample","at":"2026-06-13T18:44:12.974Z"},"url":"https://mcp.waymark.network/r/88d8ed30-6ccc-49cc-ad21-c4ac2123a230"}