Upgrades & rollback

Rolling by default, health-gated, one pool at a time. Rollback is one command. Preflight before every upgrade.

Upgrades are rolling by default: one model pool at a time, with health-gated rollout. Config-only changes are applied immediately without restarting serving pods. Rollback is a single command.

Cadence

  • Cloud — weekly minor updates, transparent; monthly patch releases.
  • Self-hosted — you pick the cadence. Quarterly is typical for regulated environments. Security fixes are shipped as out-of-band patch bundles.
  • Airgap — on your change-window cadence. Signed patch bundles are delta-friendly.

Rolling upgrade

bash
# 1. Pull the latest chart or patch bundle
helm repo update

# 2. Dry-run against current cluster state
wordcab deploy plan -f values.yaml

# 3. Apply
wordcab deploy apply -f values.yaml

# 4. Watch pool health
wordcab deploy status --watch

The operator: drains one pool replica at a time, waits for ready, continues. Per-pool maxUnavailable and maxSurge are configurable; the chart defaults are safe for production.

Canary routing

Promote a new model to a fraction of traffic before full rollout. This is an Experiments job, not an upgrade knob — same API, different semantics.

Rollback

bash
# List recent revisions
wordcab deploy history

# Roll back to revision 14
wordcab deploy rollback --to rev 14

# Or: roll back to the previous release
wordcab deploy rollback --to previous

Rollback is a Helm rollback under the hood. It restores both the chart revision and the operator-reconciled CRDs. Any data changes (new transcripts, new calls) persist — only the software state rolls back.

Stability rules

  • The API surface is stable within a major version. Breaking changes are gated on a new major.
  • CRD schemas are backward-compatible within a major — fields are added, not removed.
  • Default model ids are re-baselined quarterly; previous defaults stay callable by id for at least two quarters.
  • Skipping more than 6 minor versions at once is not supported — upgrade through the waypoints the release notes list.

Support bundle

For any non-trivial incident, run wordcab deploy support-bundle. It collects logs (last 24h), chart state, CRD state, and node info — redacted of secrets — into a signed tarball. Attach to the support ticket.

Preflight before upgrade

Run wordcab deploy preflight before every upgrade, not just installs. It catches drift (missing GPU capacity, changed StorageClass, expired certs) that would otherwise break mid-rolling-upgrade.