Docs/Deploy & operateSelf-hosted overview

Self-hosted overview

Wordcab is the same software whether it runs in our cloud or yours. Helm chart, operator, signed offline bundle, preflight, rolling upgrades, rollback.

Wordcab is the same software whether you use the cloud API or run it inside your own infrastructure. The install ships as a Helm chart plus an operator, with a signed offline bundle for airgap environments. Operators get: preflight checks, support bundles, rolling upgrades, and rollback procedures out of the box.

Supported shapes

Four deployment shapes, all built from the same chart. Pick the one your security team will approve.

ShapeWhere it runsWhen to pick it
VPC / cloud-managed K8sEKS, AKS, GKE inside your cloud accountMost common. Audio, transcripts, and artifacts stay inside your VPC in-region.
On-prem KubernetesOpenShift, RKE2, vanilla K8s on customer hardwarePhysical control. No cloud dependency in the audio path.
Airgap / sovereignFully disconnected clustersGovernment, regulated defense, high-side compartments. Signed offline bundle.
Hybrid on-prem + cloudReal-time on-prem, batch in cloudRegulated contact centers, regional data sovereignty, on-prem EHR.

Architecture overviews and reference diagrams for each shape live on the Deployment page.

What ships with the chart

  • Control plane — the same /api/v1 surface as cloud, plus the operator.
  • Model pools — STT (Qwen3-ASR, Voxtral Realtime, Cohere Transcribe 2B), LLM (Qwen3.5, Gemma 4, DeepSeek V3.2, Llama 3.3), TTS (Qwen3-TTS, Kokoro). Swap via values.yaml.
  • Serving backends — vLLM by default, SGLang and Triton supported. See Model serving backends.
  • Telephony gateway — SIP gateway Helm sub-chart for on-prem PBX. See Telephony & SIP.
  • Observability — Prometheus metrics and OpenTelemetry traces on every service. Grafana dashboards included. See Observability.
  • Preflight — a KOTS-style preflight run before install. See Airgap installs.
  • Support bundle — one command collects logs and config redacted of secrets, for support tickets.

Reference hardware

Three baselines cover the common workloads. Every shape scales horizontally from here.

ProfileGPUCPU / RAMStorageGood for
Voice-agent real-time (small)1× L40S (48 GB) or 2× L4 (24 GB)16 vCPU / 64 GB500 GB NVMe~100 concurrent low-latency streams. Voice + Think on one node.
Production (mid)2–4× L40S or H10032 vCPU / 128 GB2 TB NVMe~1,000 concurrent streams with autoscaling.
Batch at scaleH100 / H200 pool64+ vCPU / 256+ GBObject storeArchive backfill, overnight QA, large-LLM reasoning.

Getting started

  1. Talk to the Wordcab team — the chart, bundle, and license keys are shared after the Pilot kickoff.
  2. Run wordcab deploy preflight against the target cluster to check GPU availability, storage class, and ingress.
  3. Customize values.yaml — start from the reference for your shape.
  4. wordcab deploy apply -f values.yaml.
  5. Wire up identity (SAML / SCIM), observability (Prometheus / OTel), and telephony if applicable.
  6. Run the Gym against your real audio before flipping traffic.
Time to first call

On VPC / managed K8s, a typical Pilot goes from cluster access to first successful call in about two weeks. On-prem and airgap take longer because of security review — the software install itself is a day.