Observability
Prometheus, OpenTelemetry, structured logs. Nothing proprietary. Six Grafana dashboards and an SLO alert pack ship with the chart.
Wordcab emits standard, open-format telemetry on every service: Prometheus metrics, OpenTelemetry traces, and structured JSON logs. Nothing proprietary. Route it into whatever you already run.
Prometheus metrics
Every pod exposes /metrics. A ServiceMonitor / PodMonitor ships with the chart for Prometheus Operator installs; scrape config is published for vanilla Prometheus.
Key metrics
| Metric | Labels | Meaning |
|---|---|---|
wordcab_stt_streams_active | pool, language | Concurrent streaming STT sessions. |
wordcab_stt_ttft_seconds | pool, model | Histogram. First-word latency on streaming STT. |
wordcab_llm_ttft_seconds | pool, model | Histogram. First-token latency on chat completions. |
wordcab_llm_tokens_total | pool, model, direction | Counter. Prompt + completion tokens. |
wordcab_tts_first_audio_seconds | pool, voice | Histogram. First-audio latency. |
wordcab_gpu_utilization | pool, gpu | Gauge. Rolling GPU utilization (from DCGM). |
wordcab_requests_total | route, code, key_id | Counter. API request counts. |
wordcab_deployment_ready | deployment | Gauge. 1 when every pool is healthy. |
Grafana dashboards
Six dashboards ship with the chart, as ConfigMaps with the grafana_dashboard=1 label that the Grafana Operator auto-discovers. Import manually if you don't run the operator — the JSON is in dashboards/.
Wordcab · Overview— one pane of glass.Wordcab · Voice— STT/TTS latency, streams, codecs.Wordcab · Think— LLM TTFT, tokens, prefix cache hit rate.Wordcab · Agents & Calls— call volumes, failure modes.Wordcab · GPU pool health— DCGM-derived.Wordcab · Ingress & API— request rate, error rate, latency.
OpenTelemetry traces
Every request carries a W3C trace context end to end — ingress → control plane → model pool → webhook callout. OTLP export is configured at install time:
observability:
otelExporter:
endpoint: otel-collector.observability:4317
protocol: grpc # or http/protobuf
tls: { insecure: false, caFile: /etc/ssl/custom-ca.pem }
sampling:
type: parentbased_traceidratio
ratio: 0.1 # sample 10% of root spansTraces include attributes for wordcab.model, wordcab.pool, wordcab.call_id, wordcab.agent_id, wordcab.key_id. Filter on any of these in your tracing backend (Tempo, Jaeger, Honeycomb, Datadog).
Structured logs
Logs are JSON-per-line on stdout, with a stable schema. Fluent Bit, Vector, and Fluentd presets ship in the chart. Every line includes request_id, trace_id, span_id, pod, service, level.
{"ts":"2026-04-16T12:34:56Z","level":"info","service":"stt","pool":"qwen3-asr","request_id":"req_01HZ...","trace_id":"af12...","msg":"stream finalized","utterances":14,"duration_ms":9342}SLOs and alerts
The chart ships Prometheus alerting rules for a conservative default SLO set. Tune observability.slos to match your targets.
observability:
slos:
stt_streaming_ttft_p99_ms: 300
llm_ttft_p95_ms: 200
tts_first_audio_p95_ms: 150
api_availability_5m: 0.995Breach events are emitted as deployment.degraded webhooks — wire these to your on-call. See Webhooks.