Experiments
Run variants of an agent against live traffic; Gym reports a winner once the stopping rule is satisfied.
Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.
Experiments run variants of an agent against live traffic and report metrics once a stopping rule is satisfied. Results emit an experiment.finished webhook; promotion is always a human decision.
Create an experiment
The baseline agent. All non-allocated traffic stays on this agent.
Each variant has name, overrides, and traffic (0–1).
One or more of resolution_rate, avg_handle_time, tool_error_rate, csat_estimate, custom webhook-scored metrics.
{ min_samples, significance } — the harness stops when both are satisfied or a safety ceiling (max_samples, default 10,000) is hit.
Variant shape
{
"name": "deepseek-variant",
"overrides": {"llm_model": "deepseek-v3.2"},
"traffic": 0.20
}Lifecycle
/stop ends the experiment early. The response carries whatever metrics are available, even if the stop rule had not triggered.