Define the table with ENGINE = ReplacingMergeTree(ver) ORDER BY (user_id, event_type), where ver is a UInt64 or DateTime column holding the record version; the row with the highest ver value is kept after deduplication
Insert new and updated rows as normal INSERTs; do not rely on background merges to deduplicate before querying — duplicates will coexist until a merge occurs
For point queries that must return the deduplicated latest state, add FINAL to the SELECT: SELECT * FROM user_events FINAL WHERE user_id = 123; note FINAL serializes execution and adds overhead proportional to the number of parts
Trigger manual merges in non-production environments for testing: OPTIMIZE TABLE user_events FINAL; in production use OPTIMIZE TABLE user_events (without FINAL) sparingly to avoid excessive I/O
For heavy mutation workloads (bulk corrections), issue ALTER TABLE user_events UPDATE col = val WHERE condition as a ClickHouse mutation; mutations run asynchronously — check progress in system.mutations
Monitor part count and merge backlog via system.parts WHERE active AND table='user_events' — a growing number of unmerged parts degrades query performance
Known gotchas
Deduplication in ReplacingMergeTree only applies within a single shard partition; on a distributed table, rows with the same ORDER BY key that land on different shards will not be deduplicated — queries across shards require FINAL or explicit dedup logic
SELECT ... FINAL acquires a read lock per part and is serialized; on very wide tables with many small parts (immediately after bulk inserts) FINAL queries can be significantly slower than expected
Mutations (ALTER UPDATE / DELETE) in ClickHouse rewrite entire data parts; they are not transactional, are irreversible once started, and can saturate disk I/O — use lightweight deletes (DELETE FROM) introduced in recent versions for lower-impact row-level removals
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp