Self-hosted overview
Wordcab is the same software whether it runs in our cloud or yours. Helm chart, operator, signed offline bundle, preflight, rolling upgrades, rollback.
Wordcab is the same software whether you use the cloud API or run it inside your own infrastructure. The install ships as a Helm chart plus an operator, with a signed offline bundle for airgap environments. Operators get: preflight checks, support bundles, rolling upgrades, and rollback procedures out of the box.
Supported shapes
Four deployment shapes, all built from the same chart. Pick the one your security team will approve.
| Shape | Where it runs | When to pick it |
|---|---|---|
| VPC / cloud-managed K8s | EKS, AKS, GKE inside your cloud account | Most common. Audio, transcripts, and artifacts stay inside your VPC in-region. |
| On-prem Kubernetes | OpenShift, RKE2, vanilla K8s on customer hardware | Physical control. No cloud dependency in the audio path. |
| Airgap / sovereign | Fully disconnected clusters | Government, regulated defense, high-side compartments. Signed offline bundle. |
| Hybrid on-prem + cloud | Real-time on-prem, batch in cloud | Regulated contact centers, regional data sovereignty, on-prem EHR. |
Architecture overviews and reference diagrams for each shape live on the Deployment page.
What ships with the chart
- Control plane — the same
/api/v1surface as cloud, plus the operator. - Model pools — STT (Qwen3-ASR, Voxtral Realtime, Cohere Transcribe 2B), LLM (Qwen3.5, Gemma 4, DeepSeek V3.2, Llama 3.3), TTS (Qwen3-TTS, Kokoro). Swap via
values.yaml. - Serving backends — vLLM by default, SGLang and Triton supported. See Model serving backends.
- Telephony gateway — SIP gateway Helm sub-chart for on-prem PBX. See Telephony & SIP.
- Observability — Prometheus metrics and OpenTelemetry traces on every service. Grafana dashboards included. See Observability.
- Preflight — a KOTS-style preflight run before install. See Airgap installs.
- Support bundle — one command collects logs and config redacted of secrets, for support tickets.
Reference hardware
Three baselines cover the common workloads. Every shape scales horizontally from here.
| Profile | GPU | CPU / RAM | Storage | Good for |
|---|---|---|---|---|
| Voice-agent real-time (small) | 1× L40S (48 GB) or 2× L4 (24 GB) | 16 vCPU / 64 GB | 500 GB NVMe | ~100 concurrent low-latency streams. Voice + Think on one node. |
| Production (mid) | 2–4× L40S or H100 | 32 vCPU / 128 GB | 2 TB NVMe | ~1,000 concurrent streams with autoscaling. |
| Batch at scale | H100 / H200 pool | 64+ vCPU / 256+ GB | Object store | Archive backfill, overnight QA, large-LLM reasoning. |
Getting started
- Talk to the Wordcab team — the chart, bundle, and license keys are shared after the Pilot kickoff.
- Run
wordcab deploy preflightagainst the target cluster to check GPU availability, storage class, and ingress. - Customize
values.yaml— start from the reference for your shape. wordcab deploy apply -f values.yaml.- Wire up identity (SAML / SCIM), observability (Prometheus / OTel), and telephony if applicable.
- Run the Gym against your real audio before flipping traffic.
On VPC / managed K8s, a typical Pilot goes from cluster access to first successful call in about two weeks. On-prem and airgap take longer because of security review — the software install itself is a day.