Steps

Choose a deduplication key (e.g., event_id, idempotency_key) that uniquely identifies a logical event.
In Flink SQL, use a ROW_NUMBER() window function partitioned by the dedup key and ordered by event_time, then filter WHERE row_num = 1 in a downstream view.
In Flink DataStream API, use a KeyedProcessFunction keyed on the dedup key; store a flag in ValueState<Boolean> and set a timer to clear it after the dedup window expires (state TTL).
Configure state TTL via StateTtlConfig.newBuilder(Time.hours(<n>)).setUpdateType(UpdateType.OnCreateAndWrite).build() to automatically purge state for keys not seen recently.
In Kafka Streams, use a persistent KeyValueStore to track seen IDs and a punctuator or TTL tombstone to expire old entries.
Test dedup effectiveness by replaying duplicate events and verifying exactly one output per logical event.

Known gotchas

State TTL must be longer than the maximum expected duplicate arrival window; setting it too short causes deduplication to fail for late duplicates.
ROW_NUMBER dedup in Flink SQL works best on bounded or mini-batch contexts; for purely streaming unbounded dedup, the DataStream KeyedProcessFunction approach with explicit TTL is more reliable.
Dedup state size scales with the number of unique keys seen within the TTL window; profile state store memory usage under peak cardinality before deploying.

Implement Flink keyed state with ValueState and ListState in a KeyedProcessFunction for stateful stream processing

nightlies.apache.org/flink · 6 steps · unrated

Implement Flink exactly-once end-to-end semantics with a Kafka source and a transactional Kafka sink using two-phase commit

nightlies.apache.org/flink · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Implement streaming deduplication with keyed state and TTL in Flink or Kafka Streams

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?