Self-hosted overview

Wordcab is the same software whether it runs in our cloud or yours. Helm chart, operator, signed offline bundle, preflight, rolling upgrades, rollback.

Package availability

Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.

Wordcab is the same software whether you use the cloud API or run it inside your own infrastructure. The install ships as a Helm chart plus an operator, with a signed offline bundle for airgap environments. Operators get: preflight checks, support bundles, rolling upgrades, and rollback procedures out of the box.

Supported shapes

Four deployment shapes, all built from the same chart. Pick the one your security team will approve.

Shape	Where it runs	When to pick it
VPC / cloud-managed K8s	EKS, AKS, GKE inside your cloud account	Most common. Audio, transcripts, and artifacts stay inside your VPC in-region.
On-prem Kubernetes	OpenShift, RKE2, vanilla K8s on customer hardware	Physical control. No cloud dependency in the audio path.
Airgap / sovereign	Fully disconnected clusters	Government, regulated defense, high-side compartments. Signed offline bundle.
Hybrid on-prem + cloud	Real-time on-prem, batch in cloud	Regulated contact centers, regional data sovereignty, on-prem EHR.

Architecture overviews and reference diagrams for each shape live on the Deployment page.

What ships with the chart

Control plane — the same /api/v1 surface as cloud, plus the operator.
Model pools — STT (Qwen3-ASR, Voxtral Realtime, Cohere Transcribe 2B), LLM (Qwen3.5, Gemma 4, DeepSeek V3.2, Llama 3.3), TTS (Qwen3-TTS, Kokoro). Swap via values.yaml.
Serving backends — vLLM by default, SGLang and Triton supported. See Model serving backends.
Telephony gateway — SIP gateway Helm sub-chart for on-prem PBX. See Telephony & SIP.
Observability — Prometheus metrics and OpenTelemetry traces on every service. Grafana dashboards included. See Observability.
Preflight — a KOTS-style preflight run before install. See Airgap installs.
Support bundle — one command collects logs and config redacted of secrets, for support tickets.

Reference hardware

Three baselines cover the common workloads. Every shape scales horizontally from here.

Profile	GPU	CPU / RAM	Storage	Good for
Voice-agent real-time (small)	1× L40S (48 GB) or 2× L4 (24 GB)	16 vCPU / 64 GB	500 GB NVMe	~100 concurrent low-latency streams. Voice + Think on one node.
Production (mid)	2–4× L40S or H100	32 vCPU / 128 GB	2 TB NVMe	~1,000 concurrent streams with autoscaling.
Batch at scale	H100 / H200 pool	64+ vCPU / 256+ GB	Object store	Archive backfill, overnight QA, large-LLM reasoning.

Getting started

Talk to the Wordcab team — the chart, bundle, and license keys are shared after the Pilot kickoff.
Run wordcab deploy preflight against the target cluster to check GPU availability, storage class, and ingress.
Customize values.yaml — start from the reference for your shape.
wordcab deploy apply -f values.yaml.
Wire up identity (SAML / SCIM), observability (Prometheus / OTel), and telephony if applicable.
Run the Gym against your real audio before flipping traffic.

Time to first call

On VPC / managed K8s, a typical Pilot goes from cluster access to first successful call in about two weeks. On-prem and airgap take longer because of security review — the software install itself is a day.

← Previous

Commands reference

Helm chart