Transcripts

Batch transcription jobs. For real-time streaming, see the /v1/audio/transcriptions WebSocket, covered in the transcription guide.

The transcript object

json

{
  "id": "transcript_abc123",
  "object": "transcript",
  "status": "completed",
  "created": 1712345678,
  "model": "qwen3-asr",
  "language": "en",
  "duration": 142.3,
  "text": "Thanks for calling Contoso...",
  "utterances": [
    {
      "speaker": 0,
      "start": 0.0,
      "end": 2.3,
      "text": "Thanks for calling Contoso, this is Alex.",
      "confidence": 0.94
    }
  ],
  "words": [ {"word": "Thanks", "start": 0.0, "end": 0.28, "confidence": 0.98} ],
  "redactions": [{"entity": "phi", "start": 42.1, "end": 44.0}],
  "metadata": {"external_id": "call-55123"}
}

Create a transcription job

POST/v1/transcripts

Body

audio_urlstringOptional

Publicly reachable URL to the source audio. Exactly one of audio_url or audio_file is required.

audio_filefileOptional

Multipart upload of the source audio.

modelstringOptional

Model id. Defaults to the deployment's configured STT model (commonly qwen3-asr).

languagestringOptional

ISO 639-1 code. Omit for auto-detect.

diarizebooleanOptional

If true, return speaker-labeled utterances. Default false.

word_timestampsbooleanOptional

Return per-word timing. Default false.

redactstring[]Optional

Entity classes to redact. One or more of pii, phi, pci.

vocabularystring[]Optional

Custom domain terms that should be preferred during decoding.

webhook_urlstringOptional

Destination for the transcript.completed webhook on this specific job.

metadataobjectOptional

Free-form string:string map returned on the resulting object. Up to 20 keys.

Response

Returns a transcript object in queued state. Poll the GET endpoint, or subscribe to transcript.completed.

Retrieve a transcript

GET/v1/transcripts/{transcript_id}

Returns the full transcript object.

List transcripts

GET/v1/transcripts

Query parameters: limit, cursor, status, created_after, created_before.

Delete a transcript

DELETE/v1/transcripts/{transcript_id}

Hard-deletes the transcript and any stored recording. 204 on success. Deletions are permanent.

← Previous

API overview

Speech