Place the SavedModel directory under <model_repository>/<model_name>/1/model.savedmodel/ following Triton's repository structure
Write a config.pbtxt specifying platform: 'tensorflow_savedmodel', input/output tensor names and dims, and a dynamic_batching block with preferred_batch_size and max_queue_delay_microseconds
Start Triton with docker run --gpus all nvcr.io/nvidia/tritonserver:<version>-py3 tritonserver --model-repository=/models
Send inference requests using the tritonclient Python library with InferInput objects specifying the correct dtype and shape
Observe batching efficiency via the nv_inference_request_success and nv_inference_queue_duration_us Prometheus metrics exposed on port 8002
Known gotchas
The preferred_batch_size list in dynamic_batching is a hint, not a hard requirement — Triton may dispatch smaller batches if max_queue_delay_microseconds elapses first
TensorFlow SavedModel signatures with variable-length sequence inputs require setting dims: [-1] in the config — using a fixed dim will cause shape mismatch errors at runtime
Triton's model control mode defaults to 'none' (all models loaded at startup); in 'explicit' mode you must POST to /v2/repository/models/<name>/load before the model accepts requests
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp