Docs/GuidesFine-tuning

Fine-tuning (Adapt)

Benchmark WER isn't production WER. Adapt closes the gap — 10–30% relative WER reduction is typical from 10–100 hours of your real audio.

Adapt is the layer between a promising pilot and a production rollout. Generic ASR degrades 2.8–5.7× from benchmark to production — 10–30% relative WER reduction is typical from 10–100 hours of targeted fine-tuning on real audio. The full workflow lives inside your approved infrastructure; no audio leaves the boundary.

The four stages

  1. Data — intake and cleanup. Import raw audio, normalize formats, diarize, align.
  2. Evaluation — pick baselines with Gym against your real workload.
  3. Fine-tuning — produce a tuned checkpoint.
  4. Validation — held-out eval, canary deployment, promote.

Prepare a dataset

python
dataset = client.datasets.create(
    name="contact-center-2026q2",
    sources=[
        {"type": "s3", "uri": "s3://internal/calls/2026-q1/"},
    ],
    pipeline=[
        {"step": "diarize",    "model": "pyannote-3.3"},
        {"step": "align",      "model": "qwen3-asr"},
        {"step": "redact",     "entities": ["pii", "phi"]},
    ],
)

client.datasets.wait(dataset.id)
print(dataset.stats)   # hours, speakers, utterance count, OOV rate

Run an evaluation

Before you fine-tune, baseline candidate models on your data. This decides whether tuning is even needed.

python
eval_run = client.evaluations.create(
    dataset_id=dataset.id,
    candidates=["qwen3-asr", "voxtral-realtime", "cohere-transcribe-2b"],
    metrics=["wer", "diarization_error_rate", "realtime_factor"],
)

print(client.evaluations.wait(eval_run.id).leaderboard)

Fine-tune

python
job = client.fine_tunes.create(
    base_model="qwen3-asr",
    dataset_id=dataset.id,
    hyperparameters={
        "learning_rate": 1e-5,
        "epochs": 3,
        "batch_size": 16,
    },
)

tuned = client.fine_tunes.wait(job.id)
print(tuned.model_id)          # use this as "model": "ft:qwen3-asr:abc123"

Fine-tune runs execute on hardware inside your deployment. On cloud tiers, this is a reserved GPU pool under your account. On self-hosted, it is your own hardware.

Validate and promote

Run the tuned model against a held-out split and promote only when metrics clear your bar.

python
validation = client.evaluations.create(
    dataset_id=dataset.id,
    split="holdout",
    candidates=[tuned.model_id, "qwen3-asr"],
    metrics=["wer", "diarization_error_rate"],
)

if validation.wait().winner == tuned.model_id:
    client.deployments.update(
        deployment_id="prod-voice",
        routes={"stt": tuned.model_id},
    )

Custom TTS voices

The same workflow applies to text-to-speech. Submit a voice-cloning dataset with a signed consent, and Adapt produces a voice id that can be used in any agent or speech call.

Data provenance

Everything you feed Adapt stays inside your boundary — and so do the obligations that come with it. Confirm you have the recording consent needed for the jurisdictions the audio came from before you point Adapt at a bucket.