Observability

Prometheus, OpenTelemetry, structured logs. Nothing proprietary. Six Grafana dashboards and an SLO alert pack ship with the chart.

Wordcab emits standard, open-format telemetry on every service: Prometheus metrics, OpenTelemetry traces, and structured JSON logs. Nothing proprietary. Route it into whatever you already run.

Prometheus metrics

Every pod exposes /metrics. A ServiceMonitor / PodMonitor ships with the chart for Prometheus Operator installs; scrape config is published for vanilla Prometheus.

Key metrics

Metric	Labels	Meaning
`wordcab_stt_streams_active`	pool, language	Concurrent streaming STT sessions.
`wordcab_stt_ttft_seconds`	pool, model	Histogram. First-word latency on streaming STT.
`wordcab_llm_ttft_seconds`	pool, model	Histogram. First-token latency on chat completions.
`wordcab_llm_tokens_total`	pool, model, direction	Counter. Prompt + completion tokens.
`wordcab_tts_first_audio_seconds`	pool, voice	Histogram. First-audio latency.
`wordcab_gpu_utilization`	pool, gpu	Gauge. Rolling GPU utilization (from DCGM).
`wordcab_requests_total`	route, code, key_id	Counter. API request counts.
`wordcab_deployment_ready`	deployment	Gauge. 1 when every pool is healthy.

Grafana dashboards

Six dashboards ship with the chart, as ConfigMaps with the grafana_dashboard=1 label that the Grafana Operator auto-discovers. Import manually if you don't run the operator, the JSON is in dashboards/.

Wordcab · Overview, one pane of glass.
Wordcab · Voice. STT/TTS latency, streams, codecs.
Wordcab · Think. LLM TTFT, tokens, prefix cache hit rate.
Wordcab · Agents & Calls, call volumes, failure modes.
Wordcab · GPU pool health. DCGM-derived.
Wordcab · Ingress & API, request rate, error rate, latency.

OpenTelemetry traces

Every request carries a W3C trace context end to end, ingress → control plane → model pool → webhook callout. OTLP export is configured at install time:

yaml

observability:
  otelExporter:
    endpoint: otel-collector.observability:4317
    protocol: grpc       # or http/protobuf
    tls: { insecure: false, caFile: /etc/ssl/custom-ca.pem }
    sampling:
      type: parentbased_traceidratio
      ratio: 0.1         # sample 10% of root spans

Traces include attributes for wordcab.model, wordcab.pool, wordcab.call_id, wordcab.agent_id, wordcab.key_id. Filter on any of these in your tracing backend (Tempo, Jaeger, Honeycomb, Datadog).

Structured logs

Logs are JSON-per-line on stdout, with a stable schema. Fluent Bit, Vector, and Fluentd presets ship in the chart. Every line includes request_id, trace_id, span_id, pod, service, level.

json

{"ts":"2026-04-16T12:34:56Z","level":"info","service":"stt","pool":"qwen3-asr","request_id":"req_01HZ...","trace_id":"af12...","msg":"stream finalized","utterances":14,"duration_ms":9342}

SLOs and alerts

The chart ships Prometheus alerting rules for a conservative default SLO set. Tune observability.slos to match your targets.

yaml

observability:
  slos:
    stt_streaming_ttft_p99_ms: 300
    llm_ttft_p95_ms: 200
    tts_first_audio_p95_ms: 150
    api_availability_5m: 0.995

Breach events are emitted as deployment.degraded webhooks, wire these to your on-call. See Webhooks.

← Previous

Upgrades & rollback

Identity & SSO