Chat completions
Drop-in OpenAI chat API with full support for streaming, tools, JSON mode, and structured outputs.
Create a chat completion
OpenAI-compatible. Full reference below; for migration notes see the compatibility guide.
Body
Model id. See /v1/models for what is available on your key.
Conversation history. Each message has role (system | user | assistant | tool) and content (string or content-parts array for multimodal models).
0–2. Default 1. Lower = more deterministic.
Nucleus sampling cutoff. Default 1.
Upper bound on completion length.
Server-sent events. Default false.
Function schemas the model may call.
auto (default), none, required, or {\"type\":\"function\",\"function\":{\"name\":\"...\"}}.
{ type: text | json_object | json_schema }. Pass a schema to constrain output at decode time.
Deterministic sampling seed when supported by the model.
Up to 4 stop sequences.
Response
{
"id": "chatcmpl_01HZ...",
"object": "chat.completion",
"created": 1712345678,
"model": "qwen3.5-4b",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 182,
"completion_tokens": 56,
"total_tokens": 238
}
}Streaming response
data: {"id":"chatcmpl_01HZ...","choices":[{"delta":{"content":"The"}]}
data: {"id":"chatcmpl_01HZ...","choices":[{"delta":{"content":" quick"}]}
data: [DONE]