Steps

Install ray[serve] and start a Ray cluster or connect to an existing one with ray.init().
Define a deployment class decorated with @serve.deployment, implementing a __call__ method (or an async __call__ for async handling) that contains your model inference logic.
Load your model inside __init__ so it is loaded once per replica rather than on every request.
Bind the deployment to create an application object: app = MyDeployment.bind() and optionally compose multiple deployments with .bind() chaining.
Deploy the application with serve.run(app) for a local cluster, or use serve deploy config.yaml for a production cluster using a Serve config file.
Test the endpoint by sending HTTP requests to the Serve HTTP proxy address, typically http://localhost:8000 by default.

Known gotchas

Model objects loaded outside __init__ (e.g., at module level) are not properly replicated and can cause serialization errors when Ray spawns additional replicas.
The default number of replicas is 1; configure num_replicas and autoscaling_config explicitly for production workloads.
Async deployments require the __call__ method to be defined with async def; mixing sync and async incorrectly can cause the event loop to block under load.

docs.ray.io/en/latest/serve · 5 steps · unrated

Ray Serve: configure autoscaling for a deployment (min_replicas, max_replicas, target_ongoing_requests)

ml-ops · 5 steps · unrated

Deploy scalable inference with Ray Serve

docs.ray.io · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Ray Serve: create and deploy a model serving deployment

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?