Docs/Deploy & operate/Observability

Observability

Prometheus, OpenTelemetry, structured logs. Nothing proprietary. Six Grafana dashboards and an SLO alert pack ship with the chart.

Wordcab emits standard, open-format telemetry on every service: Prometheus metrics, OpenTelemetry traces, and structured JSON logs. Nothing proprietary. Route it into whatever you already run.

Prometheus metrics

Every pod exposes /metrics. A ServiceMonitor / PodMonitor ships with the chart for Prometheus Operator installs; scrape config is published for vanilla Prometheus.

Key metrics

MetricLabelsMeaning
wordcab_stt_streams_activepool, languageConcurrent streaming STT sessions.
wordcab_stt_ttft_secondspool, modelHistogram. First-word latency on streaming STT.
wordcab_llm_ttft_secondspool, modelHistogram. First-token latency on chat completions.
wordcab_llm_tokens_totalpool, model, directionCounter. Prompt + completion tokens.
wordcab_tts_first_audio_secondspool, voiceHistogram. First-audio latency.
wordcab_gpu_utilizationpool, gpuGauge. Rolling GPU utilization (from DCGM).
wordcab_requests_totalroute, code, key_idCounter. API request counts.
wordcab_deployment_readydeploymentGauge. 1 when every pool is healthy.

Grafana dashboards

Six dashboards ship with the chart, as ConfigMaps with the grafana_dashboard=1 label that the Grafana Operator auto-discovers. Import manually if you don't run the operator — the JSON is in dashboards/.

  • Wordcab · Overview — one pane of glass.
  • Wordcab · Voice — STT/TTS latency, streams, codecs.
  • Wordcab · Think — LLM TTFT, tokens, prefix cache hit rate.
  • Wordcab · Agents & Calls — call volumes, failure modes.
  • Wordcab · GPU pool health — DCGM-derived.
  • Wordcab · Ingress & API — request rate, error rate, latency.

OpenTelemetry traces

Every request carries a W3C trace context end to end — ingress → control plane → model pool → webhook callout. OTLP export is configured at install time:

yaml
observability:
  otelExporter:
    endpoint: otel-collector.observability:4317
    protocol: grpc       # or http/protobuf
    tls: { insecure: false, caFile: /etc/ssl/custom-ca.pem }
    sampling:
      type: parentbased_traceidratio
      ratio: 0.1         # sample 10% of root spans

Traces include attributes for wordcab.model, wordcab.pool, wordcab.call_id, wordcab.agent_id, wordcab.key_id. Filter on any of these in your tracing backend (Tempo, Jaeger, Honeycomb, Datadog).

Structured logs

Logs are JSON-per-line on stdout, with a stable schema. Fluent Bit, Vector, and Fluentd presets ship in the chart. Every line includes request_id, trace_id, span_id, pod, service, level.

json
{"ts":"2026-04-16T12:34:56Z","level":"info","service":"stt","pool":"qwen3-asr","request_id":"req_01HZ...","trace_id":"af12...","msg":"stream finalized","utterances":14,"duration_ms":9342}

SLOs and alerts

The chart ships Prometheus alerting rules for a conservative default SLO set. Tune observability.slos to match your targets.

yaml
observability:
  slos:
    stt_streaming_ttft_p99_ms: 300
    llm_ttft_p95_ms: 200
    tts_first_audio_p95_ms: 150
    api_availability_5m: 0.995

Breach events are emitted as deployment.degraded webhooks — wire these to your on-call. See Webhooks.