Chat completions
Drop-in OpenAI chat API with full support for streaming, tools, JSON mode, and structured outputs.
Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.
Create a chat completion
OpenAI-compatible. Full reference below; for migration notes see the compatibility guide.
Body
Model id. See /v1/models for what is available on your key.
Conversation history. Each message has role (system | user | assistant | tool) and content (string or content-parts array for multimodal models).
0–2. Default 1. Lower = more deterministic.
Nucleus sampling cutoff. Default 1.
Upper bound on completion length.
Server-sent events. Default false.
Function schemas the model may call.
auto (default), none, required, or {\"type\":\"function\",\"function\":{\"name\":\"...\"}}.
{ type: text | json_object | json_schema }. Pass a schema to constrain output at decode time.
Deterministic sampling seed when supported by the model.
Up to 4 stop sequences.
Response
{
"id": "chatcmpl_01HZ...",
"object": "chat.completion",
"created": 1712345678,
"model": "qwen3.5-4b",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 182,
"completion_tokens": 56,
"total_tokens": 238
}
}Streaming response
data: {"id":"chatcmpl_01HZ...","choices":[{"delta":{"content":"The"}]}
data: {"id":"chatcmpl_01HZ...","choices":[{"delta":{"content":" quick"}]}
data: [DONE]