Define a Service class in service.py using the @bentoml.service decorator on a Python class; expose inference logic as methods decorated with @bentoml.api specifying input and output types
Save model artifacts to the BentoML model store during training: bentoml.sklearn.save_model('my-model', clf) — the saved model can then be loaded inside the Service class with bentoml.sklearn.load_model('my-model:latest')
Build the Bento: bentoml build — this packages source code, dependencies from requirements.txt or pyproject.toml, and model artifacts into a versioned Bento
Test locally: bentoml serve service:MyService — the service starts on port 3000 by default with auto-generated Swagger UI
Containerize for deployment: bentoml containerize my-service:latest — produces an OCI-compliant Docker image
Push to BentoCloud and deploy: bentoml push my-service:latest followed by bentoml deploy my-service:latest --bento-cloud, or deploy the container image to any Kubernetes cluster
Known gotchas
Starting from BentoML 1.2, the Runner abstraction is no longer used — services are defined purely as Python classes without Runners; code using the older runner pattern requires migration
The @bentoml.api decorator infers serialization from Python type annotations (Pydantic models, numpy arrays, PIL images); using bare dicts without type hints causes serialization errors at runtime
bentoml containerize requires Docker to be running and the buildx plugin to be available — on machines without Docker, the build step succeeds but containerize fails
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp