Steps

Build or pull the Triton TensorRT-LLM backend container from the triton-inference-server/tensorrtllm_backend repository
Convert your model to TensorRT-LLM format using the trtllm-build CLI or the high-level Python LLM API
Populate a Triton model repository with the inflight_batcher_llm directory containing the C++ backend configuration files
Choose deployment mode: leader mode (one Triton process per GPU, rank 0 is leader) or orchestrator mode (single orchestrator process that spawns per-GPU workers)
Start Triton: tritonserver --model-repository=/path/to/model-repo and verify readiness on the HTTP health endpoint
Send inference requests via Triton's HTTP or gRPC endpoint; the backend handles in-flight batching and paged KV caching automatically

Known gotchas

Leader mode is simpler for single-model serving; orchestrator mode is required when serving multiple TRT-LLM models on the same server
TensorRT-LLM engine files are GPU-architecture-specific — an engine built for H100 will not run on A100
In-flight batching (continuous batching) is enabled by default in the backend; disabling it reverts to static batching and reduces throughput significantly

docs.nvidia.com/deeplearning/triton-inference-server · 6 steps · unrated

Configure a Triton Inference Server model repository

docs.nvidia.com · 6 steps · unrated

configure nvidia triton inference server explicit model control mode for load/unload via api

docs.nvidia.com/deeplearning/triton-inference-server · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Deploy an LLM with TensorRT-LLM backend on NVIDIA Triton Inference Server

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?