Install Ray with Serve extras: pip install 'ray[serve]'
Define a deployment class decorated with @serve.deployment, implementing a __call__ method (or async def __call__) that accepts a Request and returns a response
Bind the deployment to create an application object: app = MyModel.bind() — pass constructor arguments here for model loading
Deploy programmatically: serve.run(app) or from the CLI: serve run service:app — the deployment is accessible at http://localhost:8000 by default
Configure scaling by passing num_replicas or autoscaling_config to the @serve.deployment decorator: @serve.deployment(num_replicas='auto', max_ongoing_requests=100)
For production on a Ray cluster, write a Serve config YAML and apply it with serve deploy config.yaml targeting the cluster address
Known gotchas
serve.run() starts a local Ray cluster if one is not already running; in a multi-node cluster always connect to the existing cluster head with ray.init(address='auto') before calling serve.run()
Each replica runs in a separate Ray actor process — model weights loaded in __init__ are loaded once per replica, not once per cluster; size replicas accordingly for memory
The @serve.deployment decorator's autoscaling_config uses request-based autoscaling; CPU or GPU metric-based scaling requires a custom autoscaling policy and is not available in the default config
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp