Docs/GuidesFine-tuning

Fine-tuning (Adapt)

Benchmark WER isn't production WER. Adapt closes the gap. 10–30% relative WER reduction is typical from 10–100 hours of your real audio.

Adapt is the layer between a promising pilot and a production rollout. Generic ASR degrades 2.8–5.7× from benchmark to production. 10–30% relative WER reduction is typical from 10–100 hours of targeted fine-tuning on real audio. The full workflow lives inside your approved infrastructure; no audio leaves the boundary.

The four stages

  1. Data, intake and cleanup. Import raw audio, normalize formats, diarize, align.
  2. Evaluation, pick baselines with Gym against your real workload.
  3. Fine-tuning, produce a tuned checkpoint.
  4. Validation, held-out eval, canary deployment, promote.

Prepare a dataset

python
dataset = client.datasets.create(
    name="contact-center-2026q2",
    sources=[
        {"type": "s3", "uri": "s3://internal/calls/2026-q1/"},
    ],
    pipeline=[
        {"step": "diarize",    "model": "pyannote-3.3"},
        {"step": "align",      "model": "qwen3-asr"},
        {"step": "redact",     "entities": ["pii", "phi"]},
    ],
)

client.datasets.wait(dataset.id)
print(dataset.stats)   # hours, speakers, utterance count, OOV rate

Run an evaluation

Before you fine-tune, baseline candidate models on your data. This decides whether tuning is even needed.

python
eval_run = client.evaluations.create(
    dataset_id=dataset.id,
    candidates=["qwen3-asr", "voxtral-realtime", "cohere-transcribe-2b"],
    metrics=["wer", "diarization_error_rate", "realtime_factor"],
)

print(client.evaluations.wait(eval_run.id).leaderboard)

Fine-tune

python
job = client.fine_tunes.create(
    base_model="qwen3-asr",
    dataset_id=dataset.id,
    hyperparameters={
        "learning_rate": 1e-5,
        "epochs": 3,
        "batch_size": 16,
    },
)

tuned = client.fine_tunes.wait(job.id)
print(tuned.model_id)          # use this as "model": "ft:qwen3-asr:abc123"

Fine-tune runs execute on hardware inside your deployment. On cloud tiers, this is a reserved GPU pool under your account. On self-hosted, it is your own hardware.

Validate and promote

Run the tuned model against a held-out split and promote only when metrics clear your bar.

python
validation = client.evaluations.create(
    dataset_id=dataset.id,
    split="holdout",
    candidates=[tuned.model_id, "qwen3-asr"],
    metrics=["wer", "diarization_error_rate"],
)

if validation.wait().winner == tuned.model_id:
    client.deployments.update(
        deployment_id="prod-voice",
        routes={"stt": tuned.model_id},
    )

Custom TTS voices

The same workflow applies to text-to-speech. Submit a voice-cloning dataset with a signed consent, and Adapt produces a voice id that can be used in any agent or speech call.

Data provenance

Everything you feed Adapt stays inside your boundary, and so do the obligations that come with it. Confirm you have the recording consent needed for the jurisdictions the audio came from before you point Adapt at a bucket.