Docs/API/Deployments

Deployments

Manage self-hosted environments and model routing. Pairs with wordcab deploy for the actual cluster-side work.

The deployments API is used by self-hosted installations and by teams that run multiple isolated Wordcab environments (dev, staging, prod). On cloud-only accounts, the list returns a single read-only cloud deployment.

The deployment object

json
{
  "id": "dep_prod_voice",
  "name": "prod-voice-east",
  "environment": "production",
  "region": "us-east-1",
  "shape": "vpc",
  "health": "healthy",
  "routes": {
    "stt": "qwen3-asr",
    "tts": "qwen3-tts",
    "llm": "qwen3.5-4b"
  },
  "autoscaling": {"min": 1, "max": 6, "target_gpu_utilization": 0.65},
  "created": 1712345678
}

Create a deployment

POST/v1/deployments
namestringRequired

Cluster-unique identifier.

environmentstringRequired

production | staging | dev.

regionstringRequired

Cloud region (VPC / hybrid) or datacenter label (on-prem).

shapestringRequired

vpc | onprem | airgap | hybrid. Must match the target cluster's install profile.

routesobjectOptional

Map of task -> model id. Defaults pulled from the cluster config.

autoscalingobjectOptional

{ min, max, target_gpu_utilization }. Bounds the deployment's GPU pool.

Retrieve / list / update / delete

GET/v1/deployments
GET/v1/deployments/{deployment_id}
PATCH/v1/deployments/{deployment_id}
DELETE/v1/deployments/{deployment_id}

DELETE only removes the control-plane record. To actually uninstall the workload, use wordcab deploy uninstall against the cluster; the CLI reconciles both sides.

Health

The health field summarizes probe state across all model pools in the deployment. Fine-grained readiness is available on the per-pool GET /v1/deployments/{id}/pools (Platform tier).