Test suites
Scripted cases with assertions, runnable against any agent. Gate deployments on green runs.
Create a test suite
POST/v1/test-suites
namestringRequired
descriptionstringOptional
casesTestCase[]Required
Each case has input (audio_url or scripted turn list) and expected (assertions).
Case shape
json
{
"id": "case_refund_01",
"input": {"audio_url": "s3://internal/eval/refund_01.wav"},
"expected": {
"tool_calls": ["lookup_order", "start_refund"],
"assertions": [
{"type": "transcript_contains", "value": "I can help with that"},
{"type": "transcript_not_contains", "value": "I will personally"}
]
}
}List / retrieve
GET/v1/test-suites
GET/v1/test-suites/{suite_id}
Run a suite
POST/v1/test-suites/{suite_id}/runs
agent_idstringRequired
Agent to test.
overridesobjectOptional
Per-run overrides: llm_model, stt_model, tts_model, temperature, tools.
Returns a run object with status = queued. Poll or subscribe to testrun.completed.
Retrieve a run
GET/v1/test-suites/{suite_id}/runs/{run_id}
json
{
"id": "run_01HZ...",
"suite_id": "suite_abc",
"agent_id": "agent_abc",
"status": "completed",
"passed": 47,
"total": 50,
"cases": [
{"id": "case_refund_01", "passed": true, "latency_ms": 2310, "failures": []},
{"id": "case_refund_02", "passed": false, "failures": [
{"assertion": "tool_calls", "expected": ["lookup_order","start_refund"], "actual": ["lookup_order"]}
]}
]
}