Connect to the AssemblyAI v3 WebSocket endpoint: wss://streaming.assemblyai.com/v3/ws with your API key and the speech_model query parameter set to a supported identifier: universal-streaming-english, universal-streaming-multilingual, or u3-rt-pro.
On connection, receive the Begin message containing the session_id and token expiry; log these for debugging.
Stream raw audio frames to the WebSocket; handle SpeechStarted messages that signal audio activity before a transcript is ready.
Receive Turn messages containing the transcript string, end_of_turn boolean, and utterance details for each completed speech segment.
Send a terminate_session message and wait for the Termination confirmation message before closing the WebSocket connection.
Known gotchas
AssemblyAI v3 message type names differ from v2: use Begin (not SessionBegins), Turn (not PartialTranscript or FinalTranscript), and Termination (not SessionTerminated) — code written for v2 event names will fail silently or throw unhandled-message errors.
Model identifier strings are exact: universal-streaming-english and universal-streaming-multilingual are separate models; universal-streaming-multilingual supports English, Spanish, French, German, Italian, and Portuguese but not all languages supported by batch models.
The SpeechStarted event is only emitted when the model detects speech and will produce a transcript; every SpeechStarted is guaranteed to be followed by one or more Turn messages, so do not treat it as a standalone final result.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp