Open a WebSocket to wss://streaming.assemblyai.com/v3/ws (EU: wss://streaming.eu.assemblyai.com/v3/ws); pass your API key in the Authorization header.
Include the speech_model query parameter in the WebSocket URL — required, no default — use universal-streaming-english for English-only or universal-streaming-multilingual for multilingual detection.
After the connection handshake, send audio chunks as binary frames encoded in the format and sample rate you declared (consult current docs for supported formats; 16-bit PCM at 16 kHz is commonly supported).
Parse incoming JSON text frames: partial_transcript events arrive in real time; final_transcript events mark completed utterances and are available when formatting and punctuation are finalised.
To end the session, send a JSON text frame {"message_type":"TerminateSession"} and await the session_terminated event before closing the WebSocket.
Known gotchas
The model names universal-streaming-english and universal-streaming-multilingual are specific to the v3 streaming endpoint — do not use the pre-recorded API model names (nano, universal, best) here.
The speech_model parameter is mandatory; omitting it returns an error — there is no implicit fallback.
u3-rt-pro (Universal-3 Pro Streaming, launched March 2026) is a separate higher-accuracy tier available on the same v3 endpoint; verify current pricing and availability before using in production.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp