Fine-tuning (Adapt)

Benchmark WER isn't production WER. Adapt closes the gap — 10–30% relative WER reduction is typical from 10–100 hours of your real audio.

Package availability

Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.

Adapt is the layer between a promising pilot and a production rollout. Generic ASR degrades 2.8–5.7× from benchmark to production — 10–30% relative WER reduction is typical from 10–100 hours of targeted fine-tuning on real audio. The full workflow lives inside your approved infrastructure; no audio leaves the boundary.

The four stages

Data — intake and cleanup. Import raw audio, normalize formats, diarize, align.
Evaluation — pick baselines with Gym against your real workload.
Fine-tuning — produce a tuned checkpoint.
Validation — held-out eval, canary deployment, promote.

Prepare a dataset

python

dataset = client.datasets.create(
    name="contact-center-2026q2",
    sources=[
        {"type": "s3", "uri": "s3://internal/calls/2026-q1/"},
    ],
    pipeline=[
        {"step": "diarize",    "model": "pyannote-3.3"},
        {"step": "align",      "model": "qwen3-asr"},
        {"step": "redact",     "entities": ["pii", "phi"]},
    ],
)

client.datasets.wait(dataset.id)
print(dataset.stats)   # hours, speakers, utterance count, OOV rate

Run an evaluation

Before you fine-tune, baseline candidate models on your data. This decides whether tuning is even needed.

python

eval_run = client.evaluations.create(
    dataset_id=dataset.id,
    candidates=["qwen3-asr", "voxtral-realtime", "cohere-transcribe-2b"],
    metrics=["wer", "diarization_error_rate", "realtime_factor"],
)

print(client.evaluations.wait(eval_run.id).leaderboard)

Fine-tune

python

job = client.fine_tunes.create(
    base_model="qwen3-asr",
    dataset_id=dataset.id,
    hyperparameters={
        "learning_rate": 1e-5,
        "epochs": 3,
        "batch_size": 16,
    },
)

tuned = client.fine_tunes.wait(job.id)
print(tuned.model_id)          # use this as "model": "ft:qwen3-asr:abc123"

Fine-tune runs execute on hardware inside your deployment. On cloud tiers, this is a reserved GPU pool under your account. On self-hosted, it is your own hardware.

Validate and promote

Run the tuned model against a held-out split and promote only when metrics clear your bar.

python

validation = client.evaluations.create(
    dataset_id=dataset.id,
    split="holdout",
    candidates=[tuned.model_id, "qwen3-asr"],
    metrics=["wer", "diarization_error_rate"],
)

if validation.wait().winner == tuned.model_id:
    client.deployments.update(
        deployment_id="prod-voice",
        routes={"stt": tuned.model_id},
    )

Custom TTS voices

The same workflow applies to text-to-speech. Submit a voice-cloning dataset with a signed consent, and Adapt produces a voice id that can be used in any agent or speech call.

Data provenance

Everything you feed Adapt stays inside your boundary — and so do the obligations that come with it. Confirm you have the recording consent needed for the jurisdictions the audio came from before you point Adapt at a bucket.

← Previous

Evaluations

API overview