Create a Kafka engine table in ClickHouse specifying kafka_broker_list, kafka_topic_list, kafka_group_name, kafka_format (e.g., JSONEachRow or Avro), and kafka_num_consumers; this table acts as a consumer group but does not persist data itself.
Create a target MergeTree (or ReplicatedMergeTree for HA) table with the desired schema and partition/order keys for query performance.
Create a materialized view FROM the Kafka engine table TO the MergeTree table; the materialized view reads batches from the Kafka engine on a polling schedule and inserts them into the target table.
Tune kafka_max_block_size and the materialized view's poll interval (controlled by stream_flush_interval_ms at the server level) to balance ingestion latency against insert batch size.
For Avro format with a Confluent Schema Registry, set format_avro_schema_registry_url in the Kafka engine table settings to enable automatic schema resolution by schema ID embedded in the message.
Monitor consumer group lag via Kafka tooling (not ClickHouse) and watch for ClickHouse system.kafka_log and system.part_log for ingestion errors or slow inserts.
Known gotchas
The Kafka engine table is stateless from ClickHouse's perspective; if the materialized view has an error (e.g., type mismatch), ClickHouse will advance the consumer offset and silently drop the bad messages—enable kafka_handle_error_mode=stream to route errors to an error column instead.
ClickHouse Kafka consumers are not aware of partition rebalancing in the same way standard Kafka consumers are; adding partitions to a topic while the engine is running may cause missed messages until the engine is restarted.
Materialized views from Kafka engine tables run inside ClickHouse server threads, not user sessions; heavy transformation logic in the view query can starve other queries—keep view logic simple and push transformations to a downstream MV or separate pipeline.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp