Steps

Install Modal: pip install modal and authenticate with modal setup
Define an App and an image with required dependencies: app = modal.App(); image = modal.Image.debian_slim().pip_install('vllm')
Decorate a class or function with @app.function(gpu='A100', image=image) to request a specific GPU — use 'H100:4' for 4x H100s
Load the model in a @modal.enter() method on a class-based deployment so weights are loaded once per container, not per request
Deploy with modal deploy your_file.py for persistent endpoints or modal run for one-off executions
Modal bills per millisecond of actual execution with no idle charges — containers scale to zero between requests automatically

Known gotchas

Cold starts from zero can take tens of seconds for large models — use GPU Memory Snapshotting (available since late 2025) to cache model state and reduce cold start time
The gpu parameter accepts type strings like 'A100', 'A100-80GB', 'H100', or count+type like 'H100:8' — check Modal docs for current availability by region
modal run executes once and exits; modal deploy creates a persistent webhook endpoint — use deploy for production inference APIs

modal.com/docs · 6 steps · unrated

Modal: deploy a serverless GPU function

modal.com/docs · 6 steps · unrated

Autoscale a GPU inference deployment with KEDA based on external queue length

keda.sh · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Run serverless GPU inference on Modal with auto-scaling to zero for an LLM

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?