Helm chart
One chart, every shape. Install with Helm directly or let the CLI wrap it with preflight and control-plane reconciliation.
The wordcab/wordcab Helm chart installs the control plane, operator, and model pools in one step. It runs on any CNCF-conformant Kubernetes 1.27+. Two install paths: helm install directly, or wordcab deploy apply which wraps Helm, runs preflight, and reconciles the deployment record with the control plane.
Prerequisites
- Kubernetes 1.27+ (EKS, AKS, GKE, OpenShift 4.14+, RKE2, or vanilla).
- A default
StorageClasswithvolumeBindingMode: WaitForFirstConsumer. - A GPU operator or device plugin (
nvidia.com/gpuschedulable). - Ingress controller of your choice (nginx, Contour, Traefik, or cloud-native ALB).
- Helm 3.12+,
kubectlwith cluster-admin at install time (downgraded after).
Install
helm repo add wordcab https://charts.wordcab.com
helm repo update
# Install into a dedicated namespace
kubectl create ns wordcab
helm install wordcab wordcab/wordcab \
--namespace wordcab \
--values values.yamlOr, using the CLI (recommended — runs preflight and registers the deployment with the control plane):
wordcab deploy plan -f values.yaml # dry-run, shows diff
wordcab deploy apply -f values.yaml # install or update
wordcab deploy status # pool health summaryvalues.yaml reference
global:
licenseKeyRef: { name: wordcab-license, key: key }
imageRegistry: registry.example.internal # private mirror for airgap
imagePullSecrets: [{ name: regcred }]
ingress:
enabled: true
className: nginx
host: wordcab.apps.example.com
tls: { secretName: wordcab-tls }
voice:
stt:
default: qwen3-asr
pools:
- name: qwen3-asr
replicas: { min: 1, max: 4 }
resources: { nvidia.com/gpu: 1, memory: 48Gi }
backend: vllm
tts:
default: qwen3-tts
pools:
- name: qwen3-tts
replicas: { min: 1, max: 2 }
think:
llm:
default: qwen3.5-4b
pools:
- name: qwen3.5-4b
replicas: { min: 2, max: 8 }
resources: { nvidia.com/gpu: 1 }
prefixCache: true
- name: deepseek-v3.2
replicas: { min: 0, max: 2 } # on-demand, burst pool
resources: { nvidia.com/gpu: 8 }
autoscaling:
targetGPUUtilization: 0.65
metricsProvider: prometheus-adapter
auth:
oidc:
enabled: true
issuerURL: https://login.example.com/realms/prod
clientIDRef: { name: wordcab-oidc, key: client_id }
scim:
enabled: true
observability:
prometheus: { enabled: true }
otelExporter: { endpoint: otel-collector.observability:4317 }
grafanaDashboards: { enabled: true }
storage:
recordings:
type: s3 # or: minio, pvc
bucket: wordcab-recordings
retentionDays: 30
transcripts:
type: postgres
dsnRef: { name: wordcab-pg, key: dsn }Upgrades
Upgrades are rolling by default. See Upgrades & rollback for procedure and the stability rules.
wordcab deploy apply -f values.yaml # rolling upgrade
wordcab deploy rollback --to rev 14 # rollback a revisionOperator CRDs
For advanced lifecycle, the operator exposes CRDs — WordcabDeployment, ModelPool, Voice, Tenant. GitOps teams can apply these directly; the Helm values above are a higher-level abstraction over the same objects.
apiVersion: wordcab.com/v1
kind: ModelPool
metadata:
name: qwen3-asr
namespace: wordcab
spec:
task: stt
model: qwen3-asr
backend: vllm
quantization: int8
replicas: { min: 1, max: 4 }
resources: { requests: { nvidia.com/gpu: 1 } }The shipped values.yaml includes doc comments for every knob. helm show values wordcab/wordcab > reference-values.yaml gives you the full file.