Docs/API/Test suites

Test suites

Scripted cases with assertions, runnable against any agent. Gate deployments on green runs.

Create a test suite

POST/v1/test-suites
namestringRequired

descriptionstringOptional

casesTestCase[]Required

Each case has input (audio_url or scripted turn list) and expected (assertions).

Case shape

json
{
  "id": "case_refund_01",
  "input": {"audio_url": "s3://internal/eval/refund_01.wav"},
  "expected": {
    "tool_calls": ["lookup_order", "start_refund"],
    "assertions": [
      {"type": "transcript_contains",     "value": "I can help with that"},
      {"type": "transcript_not_contains", "value": "I will personally"}
    ]
  }
}

List / retrieve

GET/v1/test-suites
GET/v1/test-suites/{suite_id}

Run a suite

POST/v1/test-suites/{suite_id}/runs
agent_idstringRequired

Agent to test.

overridesobjectOptional

Per-run overrides: llm_model, stt_model, tts_model, temperature, tools.

Returns a run object with status = queued. Poll or subscribe to testrun.completed.

Retrieve a run

GET/v1/test-suites/{suite_id}/runs/{run_id}
json
{
  "id": "run_01HZ...",
  "suite_id": "suite_abc",
  "agent_id": "agent_abc",
  "status": "completed",
  "passed": 47,
  "total": 50,
  "cases": [
    {"id": "case_refund_01", "passed": true, "latency_ms": 2310, "failures": []},
    {"id": "case_refund_02", "passed": false, "failures": [
      {"assertion": "tool_calls", "expected": ["lookup_order","start_refund"], "actual": ["lookup_order"]}
    ]}
  ]
}