Transcripts
Batch transcription jobs. For real-time streaming, see the /v1/audio/transcriptions WebSocket — covered in the transcription guide.
The transcript object
{
"id": "transcript_abc123",
"object": "transcript",
"status": "completed",
"created": 1712345678,
"model": "qwen3-asr",
"language": "en",
"duration": 142.3,
"text": "Thanks for calling Contoso...",
"utterances": [
{
"speaker": 0,
"start": 0.0,
"end": 2.3,
"text": "Thanks for calling Contoso, this is Alex.",
"confidence": 0.94
}
],
"words": [ {"word": "Thanks", "start": 0.0, "end": 0.28, "confidence": 0.98} ],
"redactions": [{"entity": "phi", "start": 42.1, "end": 44.0}],
"metadata": {"external_id": "call-55123"}
}Create a transcription job
Body
Publicly reachable URL to the source audio. Exactly one of audio_url or audio_file is required.
Multipart upload of the source audio.
Model id. Defaults to the deployment's configured STT model (commonly qwen3-asr).
ISO 639-1 code. Omit for auto-detect.
If true, return speaker-labeled utterances. Default false.
Return per-word timing. Default false.
Entity classes to redact. One or more of pii, phi, pci.
Custom domain terms that should be preferred during decoding.
Destination for the transcript.completed webhook on this specific job.
Free-form string:string map returned on the resulting object. Up to 20 keys.
Response
Returns a transcript object in queued state. Poll the GET endpoint, or subscribe to transcript.completed.
Retrieve a transcript
Returns the full transcript object.
List transcripts
Query parameters: limit, cursor, status, created_after, created_before.
Delete a transcript
Hard-deletes the transcript and any stored recording. 204 on success. Deletions are permanent.