Wordcab has collaborated with Knowledgator and SCX.ai for the release of the new guardrails model family: Opir.ai
Docs/Deploy & operate/Observability

Observability

Prometheus, OpenTelemetry, structured logs. Nothing proprietary. Six Grafana dashboards and an SLO alert pack ship with the chart.

Package availability

Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.

Wordcab emits standard, open-format telemetry on every service: Prometheus metrics, OpenTelemetry traces, and structured JSON logs. Nothing proprietary. Route it into whatever you already run.

Prometheus metrics

Every pod exposes /metrics. A ServiceMonitor / PodMonitor ships with the chart for Prometheus Operator installs; scrape config is published for vanilla Prometheus.

Key metrics

MetricLabelsMeaning
wordcab_stt_streams_activepool, languageConcurrent streaming STT sessions.
wordcab_stt_ttft_secondspool, modelHistogram. First-word latency on streaming STT.
wordcab_llm_ttft_secondspool, modelHistogram. First-token latency on chat completions.
wordcab_llm_tokens_totalpool, model, directionCounter. Prompt + completion tokens.
wordcab_tts_first_audio_secondspool, voiceHistogram. First-audio latency.
wordcab_gpu_utilizationpool, gpuGauge. Rolling GPU utilization (from DCGM).
wordcab_requests_totalroute, code, key_idCounter. API request counts.
wordcab_deployment_readydeploymentGauge. 1 when every pool is healthy.

Grafana dashboards

Six dashboards ship with the chart, as ConfigMaps with the grafana_dashboard=1 label that the Grafana Operator auto-discovers. Import manually if you don't run the operator — the JSON is in dashboards/.

  • Wordcab · Overview — one pane of glass.
  • Wordcab · Voice — STT/TTS latency, streams, codecs.
  • Wordcab · Think — LLM TTFT, tokens, prefix cache hit rate.
  • Wordcab · Agents & Calls — call volumes, failure modes.
  • Wordcab · GPU pool health — DCGM-derived.
  • Wordcab · Ingress & API — request rate, error rate, latency.

OpenTelemetry traces

Every request carries a W3C trace context end to end — ingress → control plane → model pool → webhook callout. OTLP export is configured at install time:

yaml
observability:
  otelExporter:
    endpoint: otel-collector.observability:4317
    protocol: grpc       # or http/protobuf
    tls: { insecure: false, caFile: /etc/ssl/custom-ca.pem }
    sampling:
      type: parentbased_traceidratio
      ratio: 0.1         # sample 10% of root spans

Traces include attributes for wordcab.model, wordcab.pool, wordcab.call_id, wordcab.agent_id, wordcab.key_id. Filter on any of these in your tracing backend (Tempo, Jaeger, Honeycomb, Datadog).

Structured logs

Logs are JSON-per-line on stdout, with a stable schema. Fluent Bit, Vector, and Fluentd presets ship in the chart. Every line includes request_id, trace_id, span_id, pod, service, level.

json
{"ts":"2026-04-16T12:34:56Z","level":"info","service":"stt","pool":"qwen3-asr","request_id":"req_01HZ...","trace_id":"af12...","msg":"stream finalized","utterances":14,"duration_ms":9342}

SLOs and alerts

The chart ships Prometheus alerting rules for a conservative default SLO set. Tune observability.slos to match your targets.

yaml
observability:
  slos:
    stt_streaming_ttft_p99_ms: 300
    llm_ttft_p95_ms: 200
    tts_first_audio_p95_ms: 150
    api_availability_5m: 0.995

Breach events are emitted as deployment.degraded webhooks — wire these to your on-call. See Webhooks.