Observability
Prometheus, OpenTelemetry, structured logs. Nothing proprietary. Six Grafana dashboards and an SLO alert pack ship with the chart.
Wordcab SDKs, CLI tools, Helm charts, model weights, and deployment packages are delivered directly to each customer for self-hosted installation. They are not publicly published package-manager artifacts, so install commands in these docs are placeholders until your Wordcab team provides your private package source or offline bundle.
Wordcab emits standard, open-format telemetry on every service: Prometheus metrics, OpenTelemetry traces, and structured JSON logs. Nothing proprietary. Route it into whatever you already run.
Prometheus metrics
Every pod exposes /metrics. A ServiceMonitor / PodMonitor ships with the chart for Prometheus Operator installs; scrape config is published for vanilla Prometheus.
Key metrics
| Metric | Labels | Meaning |
|---|---|---|
wordcab_stt_streams_active | pool, language | Concurrent streaming STT sessions. |
wordcab_stt_ttft_seconds | pool, model | Histogram. First-word latency on streaming STT. |
wordcab_llm_ttft_seconds | pool, model | Histogram. First-token latency on chat completions. |
wordcab_llm_tokens_total | pool, model, direction | Counter. Prompt + completion tokens. |
wordcab_tts_first_audio_seconds | pool, voice | Histogram. First-audio latency. |
wordcab_gpu_utilization | pool, gpu | Gauge. Rolling GPU utilization (from DCGM). |
wordcab_requests_total | route, code, key_id | Counter. API request counts. |
wordcab_deployment_ready | deployment | Gauge. 1 when every pool is healthy. |
Grafana dashboards
Six dashboards ship with the chart, as ConfigMaps with the grafana_dashboard=1 label that the Grafana Operator auto-discovers. Import manually if you don't run the operator — the JSON is in dashboards/.
Wordcab · Overview— one pane of glass.Wordcab · Voice— STT/TTS latency, streams, codecs.Wordcab · Think— LLM TTFT, tokens, prefix cache hit rate.Wordcab · Agents & Calls— call volumes, failure modes.Wordcab · GPU pool health— DCGM-derived.Wordcab · Ingress & API— request rate, error rate, latency.
OpenTelemetry traces
Every request carries a W3C trace context end to end — ingress → control plane → model pool → webhook callout. OTLP export is configured at install time:
observability:
otelExporter:
endpoint: otel-collector.observability:4317
protocol: grpc # or http/protobuf
tls: { insecure: false, caFile: /etc/ssl/custom-ca.pem }
sampling:
type: parentbased_traceidratio
ratio: 0.1 # sample 10% of root spansTraces include attributes for wordcab.model, wordcab.pool, wordcab.call_id, wordcab.agent_id, wordcab.key_id. Filter on any of these in your tracing backend (Tempo, Jaeger, Honeycomb, Datadog).
Structured logs
Logs are JSON-per-line on stdout, with a stable schema. Fluent Bit, Vector, and Fluentd presets ship in the chart. Every line includes request_id, trace_id, span_id, pod, service, level.
{"ts":"2026-04-16T12:34:56Z","level":"info","service":"stt","pool":"qwen3-asr","request_id":"req_01HZ...","trace_id":"af12...","msg":"stream finalized","utterances":14,"duration_ms":9342}SLOs and alerts
The chart ships Prometheus alerting rules for a conservative default SLO set. Tune observability.slos to match your targets.
observability:
slos:
stt_streaming_ttft_p99_ms: 300
llm_ttft_p95_ms: 200
tts_first_audio_p95_ms: 150
api_availability_5m: 0.995Breach events are emitted as deployment.degraded webhooks — wire these to your on-call. See Webhooks.