Deployments
Manage self-hosted environments and model routing. Pairs with wordcab deploy for the actual cluster-side work.
The deployments API is used by self-hosted installations and by teams that run multiple isolated Wordcab environments (dev, staging, prod). On cloud-only accounts, the list returns a single read-only cloud deployment.
The deployment object
{
"id": "dep_prod_voice",
"name": "prod-voice-east",
"environment": "production",
"region": "us-east-1",
"shape": "vpc",
"health": "healthy",
"routes": {
"stt": "qwen3-asr",
"tts": "qwen3-tts",
"llm": "qwen3.5-4b"
},
"autoscaling": {"min": 1, "max": 6, "target_gpu_utilization": 0.65},
"created": 1712345678
}Create a deployment
Cluster-unique identifier.
production | staging | dev.
Cloud region (VPC / hybrid) or datacenter label (on-prem).
vpc | onprem | airgap | hybrid. Must match the target cluster's install profile.
Map of task -> model id. Defaults pulled from the cluster config.
{ min, max, target_gpu_utilization }. Bounds the deployment's GPU pool.
Retrieve / list / update / delete
DELETE only removes the control-plane record. To actually uninstall the workload, use wordcab deploy uninstall against the cluster; the CLI reconciles both sides.
Health
The health field summarizes probe state across all model pools in the deployment. Fine-grained readiness is available on the per-pool GET /v1/deployments/{id}/pools (Platform tier).