Steps

Organize a model repository directory with the structure MODEL_REPO/MODEL_NAME/VERSION_NUMBER/model.EXTENSION where VERSION_NUMBER is an integer subdirectory.
Create a config.pbtxt file in the MODEL_NAME directory specifying at least the platform (e.g., 'onnxruntime_onnx', 'pytorch_libtorch', 'tensorrt_plan'), and the input and output tensor names, data types, and shapes.
Pull the Triton server container image from the NVIDIA NGC registry using the appropriate tag for your desired backend and CUDA version.
Launch the container mounting the model repository: docker run --gpus all -v /local/model_repo:/models -p 8000:8000 -p 8001:8001 -p 8002:8002 nvcr.io/nvidia/tritonserver:TAG tritonserver --model-repository=/models.
Verify the server is ready by calling GET http://localhost:8000/v2/health/ready and confirm models are loaded at GET http://localhost:8000/v2/models/MODEL_NAME.
Send inference requests using the HTTP or gRPC endpoints following the KServe v2 inference protocol; use the tritonclient Python library for convenience.

Known gotchas

Tensor shapes in config.pbtxt must match exactly what the model expects; a shape mismatch (including batch dimension handling) causes a model load failure.
Triton uses a specific versioning policy (LATEST, ALL, or specific versions) defined in config.pbtxt; not setting this means only the latest version number directory is served by default.
GPU backends require that the host machine has compatible NVIDIA drivers installed; container CUDA versions must be less than or equal to the host driver's supported CUDA version.

docs.nvidia.com · 6 steps · unrated

configure nvidia triton inference server explicit model control mode for load/unload via api

docs.nvidia.com/deeplearning/triton-inference-server · 5 steps · unrated

NVIDIA Triton Inference Server: configure a model repository backed by Amazon S3 instead of local disk

ml-ops · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

NVIDIA Triton Inference Server: set up a model repository and serve

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?