Steps

Install Ray Serve: pip install 'ray[serve]'
Define an LLMConfig object specifying model_id, engine_kwargs (vLLM-compatible), and accelerator_type
Use build_openai_app(llm_config) to create a Serve application that exposes OpenAI-compatible /v1/chat/completions and /v1/completions routes
Deploy with serve.run(app) locally or ray serve deploy for production cluster deployment
For multi-model serving, pass a list of LLMConfig objects to build_openai_app — an LLMModelRouter handles routing across models
Most engine_kwargs that work with vllm serve are forwarded directly by Ray Serve LLM to the underlying vLLM engine

Known gotchas

Ray Serve LLM uses vLLM as its inference engine — vLLM must be installed alongside Ray for GPU inference to work
The agent_engines module in the Vertex AI SDK is being refactored; similarly, Ray Serve LLM APIs are evolving rapidly — pin your Ray version and review release notes before upgrading
Prefix-aware routing (routing requests with shared prefixes to the same replica) is a separate feature that requires explicit configuration in newer Ray versions

docs.vllm.ai · 5 steps · unrated

Serve LLMs with vLLM's OpenAI-compatible server

docs.vllm.ai · 6 steps · unrated

vLLM: serve a model behind an OpenAI-compatible HTTP API using `vllm serve`

ml-ops · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Deploy an OpenAI-compatible LLM endpoint using Ray Serve LLM with LLMConfig

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?