Steps

Install Ray with Serve extras: pip install 'ray[serve]'
Define a deployment class decorated with @serve.deployment, implementing a __call__ method (or async def __call__) that accepts a Request and returns a response
Bind the deployment to create an application object: app = MyModel.bind() — pass constructor arguments here for model loading
Deploy programmatically: serve.run(app) or from the CLI: serve run service:app — the deployment is accessible at http://localhost:8000 by default
Configure scaling by passing num_replicas or autoscaling_config to the @serve.deployment decorator: @serve.deployment(num_replicas='auto', max_ongoing_requests=100)
For production on a Ray cluster, write a Serve config YAML and apply it with serve deploy config.yaml targeting the cluster address

Known gotchas

serve.run() starts a local Ray cluster if one is not already running; in a multi-node cluster always connect to the existing cluster head with ray.init(address='auto') before calling serve.run()
Each replica runs in a separate Ray actor process — model weights loaded in __init__ are loaded once per replica, not once per cluster; size replicas accordingly for memory
The @serve.deployment decorator's autoscaling_config uses request-based autoscaling; CPU or GPU metric-based scaling requires a custom autoscaling policy and is not available in the default config

ml-ops · 5 steps · unrated

KServe: deploy an InferenceService on Kubernetes

kserve.github.io/website/docs · 6 steps · unrated

KServe: perform a canary rollout by splitting traffic between two InferenceService revisions

ml-ops · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Deploy scalable inference with Ray Serve

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?