Docs/Rate limits

Rate limits

Every response tells you exactly where you are. Back off on 429; pre-scale with support for planned peaks.

Rate limits are applied per API key. Every response includes the current state so you can back off gracefully. Limits on api.wordcab.com are higher on Production accounts than on the default Pilot plan; on self-hosted deployments the limits are whatever your cluster can serve.

Headers

http
HTTP/1.1 200 OK
X-RateLimit-Limit:      1000
X-RateLimit-Remaining:  973
X-RateLimit-Reset:      1712345678
X-Request-ID:           req_01HZABCDEF...
  • X-RateLimit-Limit — the window's ceiling in requests.
  • X-RateLimit-Remaining — what is left before a 429.
  • X-RateLimit-Reset — Unix timestamp (seconds) when the window rolls over.

Default limits

PlanRequests / minTokens / min (Think)Concurrent streams (Voice)
Pilot60100,0005
Production1,0001,500,000250
Sovereign / self-hostedBounded by your deployment's capacity. No control-plane cap.

429 responses

When a limit trips, the API returns 429 with a Retry-After header (seconds) and a rate_limit_error body. Wait at least Retry-After seconds before retrying.

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "requests_per_minute_exceeded",
    "message": "Too many requests. Retry after 12 seconds.",
    "retry_after": 12,
    "request_id": "req_01HZ..."
  }
}

Lifting limits

  • Production accounts start at the limits above; custom ceilings are set during Pilot-to-Production transition.
  • If you need headroom for a planned event (launch day, seasonal volume), mail support@wordcab.com a week ahead with the expected peak RPS and we will pre-scale.
  • Self-hosted deployments scale horizontally from the Helm chart; there is no request-level governor on the control plane itself.