Install Modal: pip install modal and authenticate with modal setup
Define an App and an image with required dependencies: app = modal.App(); image = modal.Image.debian_slim().pip_install('vllm')
Decorate a class or function with @app.function(gpu='A100', image=image) to request a specific GPU — use 'H100:4' for 4x H100s
Load the model in a @modal.enter() method on a class-based deployment so weights are loaded once per container, not per request
Deploy with modal deploy your_file.py for persistent endpoints or modal run for one-off executions
Modal bills per millisecond of actual execution with no idle charges — containers scale to zero between requests automatically
Known gotchas
Cold starts from zero can take tens of seconds for large models — use GPU Memory Snapshotting (available since late 2025) to cache model state and reduce cold start time
The gpu parameter accepts type strings like 'A100', 'A100-80GB', 'H100', or count+type like 'H100:8' — check Modal docs for current availability by region
modal run executes once and exits; modal deploy creates a persistent webhook endpoint — use deploy for production inference APIs
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp