Docs/API/Experiments

Experiments

Run variants of an agent against live traffic; Gym reports a winner once the stopping rule is satisfied.

Experiments run variants of an agent against live traffic and report metrics once a stopping rule is satisfied. Results emit an experiment.finished webhook; promotion is always a human decision.

Create an experiment

POST/v1/experiments
namestringRequired

control_agent_idstringRequired

The baseline agent. All non-allocated traffic stays on this agent.

variantsVariant[]Required

Each variant has name, overrides, and traffic (0–1).

metricsstring[]Optional

One or more of resolution_rate, avg_handle_time, tool_error_rate, csat_estimate, custom webhook-scored metrics.

stop_ruleobjectOptional

{ min_samples, significance } — the harness stops when both are satisfied or a safety ceiling (max_samples, default 10,000) is hit.

Variant shape

json
{
  "name": "deepseek-variant",
  "overrides": {"llm_model": "deepseek-v3.2"},
  "traffic": 0.20
}

Lifecycle

GET/v1/experiments
GET/v1/experiments/{experiment_id}
POST/v1/experiments/{experiment_id}/stop

/stop ends the experiment early. The response carries whatever metrics are available, even if the stop rule had not triggered.