Upgrades & rollback
Rolling by default, health-gated, one pool at a time. Rollback is one command. Preflight before every upgrade.
Upgrades are rolling by default: one model pool at a time, with health-gated rollout. Config-only changes are applied immediately without restarting serving pods. Rollback is a single command.
Cadence
- Cloud — weekly minor updates, transparent; monthly patch releases.
- Self-hosted — you pick the cadence. Quarterly is typical for regulated environments. Security fixes are shipped as out-of-band patch bundles.
- Airgap — on your change-window cadence. Signed patch bundles are delta-friendly.
Rolling upgrade
# 1. Pull the latest chart or patch bundle
helm repo update
# 2. Dry-run against current cluster state
wordcab deploy plan -f values.yaml
# 3. Apply
wordcab deploy apply -f values.yaml
# 4. Watch pool health
wordcab deploy status --watchThe operator: drains one pool replica at a time, waits for ready, continues. Per-pool maxUnavailable and maxSurge are configurable; the chart defaults are safe for production.
Canary routing
Promote a new model to a fraction of traffic before full rollout. This is an Experiments job, not an upgrade knob — same API, different semantics.
Rollback
# List recent revisions
wordcab deploy history
# Roll back to revision 14
wordcab deploy rollback --to rev 14
# Or: roll back to the previous release
wordcab deploy rollback --to previousRollback is a Helm rollback under the hood. It restores both the chart revision and the operator-reconciled CRDs. Any data changes (new transcripts, new calls) persist — only the software state rolls back.
Stability rules
- The API surface is stable within a major version. Breaking changes are gated on a new major.
- CRD schemas are backward-compatible within a major — fields are added, not removed.
- Default model ids are re-baselined quarterly; previous defaults stay callable by id for at least two quarters.
- Skipping more than 6 minor versions at once is not supported — upgrade through the waypoints the release notes list.
Support bundle
For any non-trivial incident, run wordcab deploy support-bundle. It collects logs (last 24h), chart state, CRD state, and node info — redacted of secrets — into a signed tarball. Attach to the support ticket.
Run wordcab deploy preflight before every upgrade, not just installs. It catches drift (missing GPU capacity, changed StorageClass, expired certs) that would otherwise break mid-rolling-upgrade.