Set up a Triton model repository with a directory for each model: a preprocessing model (Python backend), an inference model, and an ensemble model
Write a config.pbtxt for each component model specifying input and output tensor names, data types, and dimensions
Enable dynamic batching on the inference model by adding a dynamic_batching block in its config.pbtxt; set preferred_batch_size and max_queue_delay_microseconds to tune batching behavior
Define the ensemble model's config.pbtxt with an ensemble_scheduling block that maps output tensors from the preprocessing model to input tensors of the inference model, forming the pipeline graph
Start Triton with docker run pointing to the model repository and use the health endpoint to confirm all models are loaded and ready
Send inference requests to the ensemble model endpoint; Triton routes inputs through the pipeline and applies dynamic batching to the inference model internally
Known gotchas
Tensor name and data type mismatches between the ensemble step_output and the downstream model's input tensor names cause Triton to fail to load the ensemble with a cryptic configuration error
Dynamic batching is configured per model and does not propagate automatically to ensemble components; each composing model that should batch must have its own dynamic_batching block
The Python backend preprocessing model runs in a separate process; if its Python environment lacks required packages, the model will fail to load with a process launch error rather than a missing-import error
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp