Create a table with the ReplacingMergeTree engine, specifying a version column (e.g., updated_at UInt64 or DateTime) as the engine parameter: ENGINE = ReplacingMergeTree(version); set the ORDER BY clause to your natural deduplication key (the combination of columns that identifies a unique logical row)
Insert data in large batches rather than row-by-row: ClickHouse is optimized for bulk inserts of at least 1,000–100,000 rows per INSERT statement; small frequent inserts create many small parts and degrade merge performance
Use the HTTP interface or native protocol with async_insert=1 for high-throughput streaming ingestion where batching at the client is impractical; ClickHouse will buffer and merge writes server-side
Understand that ReplacingMergeTree deduplication happens lazily during background merges — immediately after insert, duplicate rows exist and will be returned by SELECT; use FINAL modifier (SELECT ... FROM table FINAL) to force deduplication at query time, or use the argMax pattern for latest-value queries
Use INSERT with SETTINGS max_insert_block_size and adjust max_partitions_per_insert_block if inserting across many partitions; partition by a low-cardinality column like toYYYYMM(event_date) not by a high-cardinality field
Monitor part count via SELECT count() FROM system.parts WHERE table='<table>' AND active=1; a very high part count (thousands) indicates merges are falling behind inserts and query performance will degrade
Known gotchas
ReplacingMergeTree only deduplicates within the same partition — rows with the same ORDER BY key but in different partitions will not be deduplicated; ensure your ORDER BY key and partition scheme are aligned
The FINAL keyword forces a merge at query time and can be very slow on large tables; for production queries prefer the argMax() aggregate pattern or schedule periodic OPTIMIZE TABLE ... FINAL during low-traffic windows
ClickHouse does not enforce uniqueness constraints — it is the application's responsibility to handle duplicates; after a failed insert that was partially applied, re-inserting the same block is safe only if the block contents are identical (ClickHouse uses block checksums for idempotent insert dedup)
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp