Use the Bulk API endpoint POST /_bulk which accepts newline-delimited JSON (NDJSON): alternate action lines ({"index":{"_index":"<name>","_id":"<id>"}}) with source document lines; each pair is one operation; end the body with a trailing newline
Target bulk request sizes of 5–15 MiB per request and 1,000–5,000 documents per batch as a starting point; tune based on document size and cluster capacity — too-large batches cause GC pressure and timeouts, too-small batches waste HTTP overhead
Check the bulk response body for per-item errors even on HTTP 200 responses — the bulk API returns 200 even when individual items fail; iterate over the items array and check errors: true on each entry to identify and retry failed documents
Handle backpressure by watching for HTTP 429 (Too Many Requests) with an es_rejected_execution_exception; implement exponential backoff with jitter and retry the entire batch; do not drop documents on 429
Tune indexing performance: set refresh_interval to 30s or -1 during bulk loads (disable auto-refresh) and increase number_of_replicas to 0 during initial load, then restore both after loading; this significantly improves ingest throughput
Use the _bulk API with routing specified in the action metadata to target specific shards and reduce coordination overhead for high-volume writes into time-series indexes
Known gotchas
The bulk API returns HTTP 200 even when every single document in the batch failed — always parse the response body and check items[*].index.error or items[*].create.error; ignoring the response body causes silent data loss
Large shard counts hurt write performance because each document write must be replicated to all replica shards and indexed in multiple primary shards; start with a reasonable shard count (target 10–50 GB per shard) and avoid over-sharding
Setting refresh_interval=-1 during bulk load means newly indexed documents are not searchable until a manual refresh or until the index is refreshed at the end of the load; always restore refresh_interval and force a refresh (POST /<index>/_refresh) after bulk loading completes
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp