Helm chart

One chart, every shape. Install with Helm directly or let the CLI wrap it with preflight and control-plane reconciliation.

The wordcab/wordcab Helm chart installs the control plane, operator, and model pools in one step. It runs on any CNCF-conformant Kubernetes 1.27+. Two install paths: helm install directly, or wordcab deploy apply which wraps Helm, runs preflight, and reconciles the deployment record with the control plane.

Prerequisites

  • Kubernetes 1.27+ (EKS, AKS, GKE, OpenShift 4.14+, RKE2, or vanilla).
  • A default StorageClass with volumeBindingMode: WaitForFirstConsumer.
  • A GPU operator or device plugin (nvidia.com/gpu schedulable).
  • Ingress controller of your choice (nginx, Contour, Traefik, or cloud-native ALB).
  • Helm 3.12+, kubectl with cluster-admin at install time (downgraded after).

Install

bash
helm repo add wordcab https://charts.wordcab.com
helm repo update

# Install into a dedicated namespace
kubectl create ns wordcab
helm install wordcab wordcab/wordcab \
  --namespace wordcab \
  --values values.yaml

Or, using the CLI (recommended — runs preflight and registers the deployment with the control plane):

bash
wordcab deploy plan  -f values.yaml   # dry-run, shows diff
wordcab deploy apply -f values.yaml   # install or update
wordcab deploy status                  # pool health summary

values.yaml reference

yaml
global:
  licenseKeyRef: { name: wordcab-license, key: key }
  imageRegistry: registry.example.internal        # private mirror for airgap
  imagePullSecrets: [{ name: regcred }]

ingress:
  enabled: true
  className: nginx
  host: wordcab.apps.example.com
  tls: { secretName: wordcab-tls }

voice:
  stt:
    default: qwen3-asr
    pools:
      - name: qwen3-asr
        replicas: { min: 1, max: 4 }
        resources: { nvidia.com/gpu: 1, memory: 48Gi }
        backend: vllm
  tts:
    default: qwen3-tts
    pools:
      - name: qwen3-tts
        replicas: { min: 1, max: 2 }

think:
  llm:
    default: qwen3.5-4b
    pools:
      - name: qwen3.5-4b
        replicas: { min: 2, max: 8 }
        resources: { nvidia.com/gpu: 1 }
        prefixCache: true
      - name: deepseek-v3.2
        replicas: { min: 0, max: 2 }      # on-demand, burst pool
        resources: { nvidia.com/gpu: 8 }

autoscaling:
  targetGPUUtilization: 0.65
  metricsProvider: prometheus-adapter

auth:
  oidc:
    enabled: true
    issuerURL: https://login.example.com/realms/prod
    clientIDRef: { name: wordcab-oidc, key: client_id }
  scim:
    enabled: true

observability:
  prometheus: { enabled: true }
  otelExporter: { endpoint: otel-collector.observability:4317 }
  grafanaDashboards: { enabled: true }

storage:
  recordings:
    type: s3                                 # or: minio, pvc
    bucket: wordcab-recordings
    retentionDays: 30
  transcripts:
    type: postgres
    dsnRef: { name: wordcab-pg, key: dsn }

Upgrades

Upgrades are rolling by default. See Upgrades & rollback for procedure and the stability rules.

bash
wordcab deploy apply -f values.yaml    # rolling upgrade
wordcab deploy rollback --to rev 14    # rollback a revision

Operator CRDs

For advanced lifecycle, the operator exposes CRDs — WordcabDeployment, ModelPool, Voice, Tenant. GitOps teams can apply these directly; the Helm values above are a higher-level abstraction over the same objects.

yaml
apiVersion: wordcab.com/v1
kind: ModelPool
metadata:
  name: qwen3-asr
  namespace: wordcab
spec:
  task: stt
  model: qwen3-asr
  backend: vllm
  quantization: int8
  replicas: { min: 1, max: 4 }
  resources: { requests: { nvidia.com/gpu: 1 } }
Values are documented inline

The shipped values.yaml includes doc comments for every knob. helm show values wordcab/wordcab > reference-values.yaml gives you the full file.