Experiments

Run variants of an agent against live traffic; Gym reports a winner once the stopping rule is satisfied.

Package availability

Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.

Experiments run variants of an agent against live traffic and report metrics once a stopping rule is satisfied. Results emit an experiment.finished webhook; promotion is always a human decision.

Create an experiment

POST/v1/experiments

namestringRequired

control_agent_idstringRequired

The baseline agent. All non-allocated traffic stays on this agent.

variantsVariant[]Required

Each variant has name, overrides, and traffic (0–1).

metricsstring[]Optional

One or more of resolution_rate, avg_handle_time, tool_error_rate, csat_estimate, custom webhook-scored metrics.

stop_ruleobjectOptional

{ min_samples, significance } — the harness stops when both are satisfied or a safety ceiling (max_samples, default 10,000) is hit.

Variant shape

json

{
  "name": "deepseek-variant",
  "overrides": {"llm_model": "deepseek-v3.2"},
  "traffic": 0.20
}

Lifecycle

GET/v1/experiments

GET/v1/experiments/{experiment_id}

POST/v1/experiments/{experiment_id}/stop

/stop ends the experiment early. The response carries whatever metrics are available, even if the stop rule had not triggered.

← Previous

Test suites

Deployments