Speech
Generate audio from text. Streaming supported. OpenAI-compatible path.
Create speech
OpenAI-compatible text-to-speech endpoint. Returns an audio stream (or buffer, depending on your client) in the requested format.
Body
Text or SSML to synthesize. Max 4,096 characters per request; longer inputs are split at sentence boundaries and streamed back seamlessly.
Model id. qwen3-tts is the default; kokoro for CPU-only deployments.
Voice id. Built-in voices: ember, slate, marin, onyx, brook, ash. Custom voices from Adapt are also valid.
One of mp3 (default), wav, pcm16, ulaw, opus.
Required with pcm16. 8000, 16000, 24000, or 48000.
0.5 – 2.0. Default 1.0.
Semitones, -12 to +12. Default 0.
neutral, calm, or expressive.
If true, respond with Transfer-Encoding: chunked as audio is produced.
Example
curl -X POST https://api.wordcab.com/v1/audio/speech \\
-H "Authorization: Bearer $WORDCAB_API_KEY" \\
-H "Content-Type: application/json" \\
-d '{
"model": "qwen3-tts",
"voice": "ember",
"input": "Hello from Wordcab.",
"response_format": "mp3"
}' --output hello.mp3SSML
Pass SSML instead of plain text for pauses, pronunciation, and emphasis. The input must start with <speak>.
<speak>
Your order <emphasis level="strong">has shipped</emphasis>.
Tracking number: <say-as interpret-as="characters">1Z999AA10123456784</say-as>.
</speak>