Steps

Authenticate by including your API key in the 'xi-api-key' HTTP header on all requests — obtain your key from the ElevenLabs dashboard under Profile > API Key
List available voices by GETting 'https://api.elevenlabs.io/v1/voices' — the response contains an array of voice objects with 'voice_id', 'name', 'labels', and 'preview_url'; use the 'voice_id' value to specify a voice in generation requests
Generate speech by POSTing to 'https://api.elevenlabs.io/v1/text-to-speech/{voice_id}' with a JSON body containing 'text' (the content to synthesize), 'model_id' (e.g., 'eleven_flash_v2_5' for low-latency or 'eleven_multilingual_v2' for quality), and optionally 'voice_settings' with 'stability' and 'similarity_boost' floats between 0 and 1
Set the 'Accept' header to 'audio/mpeg' to receive MP3 output — the response body is the raw audio binary; write it directly to a file or pipe it to a player
For streaming audio in real time, POST to 'https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream' — the response is a chunked stream of audio bytes you can forward directly to an audio player or WebSocket client without waiting for the full file
For the lowest latency in voice agent applications, use the WebSocket endpoint at 'wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input' which accepts text chunks incrementally and returns audio chunks as soon as each is synthesized

Known gotchas

Different ElevenLabs models have different character limits per request: Eleven Flash v2.5 and Turbo v2.5 support up to 40,000 characters while Multilingual v2 supports up to 10,000 and Eleven v3 supports up to 3,000 — split long texts before calling the API if you exceed the model's limit
The 'voice_settings' parameters (stability, similarity_boost) interact with each other non-linearly — high stability with high similarity can produce flat, robotic delivery; lower stability values introduce more natural variation but can drift from the cloned voice
ElevenLabs charges by character generated, not by audio duration — repeated generation of the same text for testing consumes quota; use the audio preview from the dashboard for iterative voice and settings testing before committing API calls

assemblyai.com · 5 steps · unrated

Dub a video or audio file into another language using the ElevenLabs Dubbing API

elevenlabs.io · 5 steps · unrated

Enhance and transcode audio in a single request using Dolby.io Media APIs

dolby.io · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Integrate the ElevenLabs Text-to-Speech API to generate and stream audio

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?