No extra install is needed — vLLM bundles xgrammar as the default guided decoding backend in recent releases
Submit a chat completion request with extra_body={'guided_json': your_json_schema} to constrain output to a specific JSON schema
Alternatively use guided_regex for regex patterns, guided_choice for enumerated values, or guided_grammar for context-free grammars
Define the JSON schema as a plain dict or use pydantic_model.model_json_schema() to generate it from a Pydantic model
Set guided_decoding_backend in engine args if you need to override the default — options include xgrammar, outlines, and lm-format-enforcer
Parse the response content with json.loads() — the model output is guaranteed to conform to the schema
Known gotchas
xgrammar is the default and fastest backend in 2026; outlines had lower compliance on complex schemas in benchmarks due to compilation timeouts
guided_json constrains token sampling but does not validate semantic correctness — the output will be syntactically valid JSON but values may still be hallucinated
Very large or deeply nested schemas increase JIT compilation time on the first request — warm up the server before live traffic
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp