Fine-tuning (Adapt)
Benchmark WER isn't production WER. Adapt closes the gap — 10–30% relative WER reduction is typical from 10–100 hours of your real audio.
Adapt is the layer between a promising pilot and a production rollout. Generic ASR degrades 2.8–5.7× from benchmark to production — 10–30% relative WER reduction is typical from 10–100 hours of targeted fine-tuning on real audio. The full workflow lives inside your approved infrastructure; no audio leaves the boundary.
The four stages
- Data — intake and cleanup. Import raw audio, normalize formats, diarize, align.
- Evaluation — pick baselines with Gym against your real workload.
- Fine-tuning — produce a tuned checkpoint.
- Validation — held-out eval, canary deployment, promote.
Prepare a dataset
dataset = client.datasets.create(
name="contact-center-2026q2",
sources=[
{"type": "s3", "uri": "s3://internal/calls/2026-q1/"},
],
pipeline=[
{"step": "diarize", "model": "pyannote-3.3"},
{"step": "align", "model": "qwen3-asr"},
{"step": "redact", "entities": ["pii", "phi"]},
],
)
client.datasets.wait(dataset.id)
print(dataset.stats) # hours, speakers, utterance count, OOV rateRun an evaluation
Before you fine-tune, baseline candidate models on your data. This decides whether tuning is even needed.
eval_run = client.evaluations.create(
dataset_id=dataset.id,
candidates=["qwen3-asr", "voxtral-realtime", "cohere-transcribe-2b"],
metrics=["wer", "diarization_error_rate", "realtime_factor"],
)
print(client.evaluations.wait(eval_run.id).leaderboard)Fine-tune
job = client.fine_tunes.create(
base_model="qwen3-asr",
dataset_id=dataset.id,
hyperparameters={
"learning_rate": 1e-5,
"epochs": 3,
"batch_size": 16,
},
)
tuned = client.fine_tunes.wait(job.id)
print(tuned.model_id) # use this as "model": "ft:qwen3-asr:abc123"Fine-tune runs execute on hardware inside your deployment. On cloud tiers, this is a reserved GPU pool under your account. On self-hosted, it is your own hardware.
Validate and promote
Run the tuned model against a held-out split and promote only when metrics clear your bar.
validation = client.evaluations.create(
dataset_id=dataset.id,
split="holdout",
candidates=[tuned.model_id, "qwen3-asr"],
metrics=["wer", "diarization_error_rate"],
)
if validation.wait().winner == tuned.model_id:
client.deployments.update(
deployment_id="prod-voice",
routes={"stt": tuned.model_id},
)Custom TTS voices
The same workflow applies to text-to-speech. Submit a voice-cloning dataset with a signed consent, and Adapt produces a voice id that can be used in any agent or speech call.
Everything you feed Adapt stays inside your boundary — and so do the obligations that come with it. Confirm you have the recording consent needed for the jurisdictions the audio came from before you point Adapt at a bucket.