Install torchserve and torch-model-archiver packages.
Write a custom handler class inheriting from BaseHandler (or use a built-in handler name such as 'image_classifier') that implements preprocess, inference, and postprocess methods.
Create the model archive file using the torch-model-archiver CLI: torch-model-archiver --model-name NAME --version 1.0 --serialized-file model.pt --handler handler.py --export-path model_store/.
Start the TorchServe server pointing at the model store: torchserve --start --model-store model_store --models NAME=NAME.mar.
Send an inference request to the Management API first if needed, then to the Inference API at POST http://localhost:8080/predictions/NAME with the appropriate request body.
Stop the server with torchserve --stop and check logs in the logs/ directory for any errors.
Known gotchas
The handler's preprocess method receives raw bytes from the request; failing to decode or deserialize them correctly before passing to the model is a common source of errors.
Model archives are immutable once created; any change to the model file or handler requires regenerating the .mar file and reregistering the model.
TorchServe defaults to allowing only localhost connections; to expose the server on a network interface, configure the inference_address and management_address in config.properties.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp