Telephony & SIP

Three ways to get voice in and out: programmable-voice over WebSocket, native SIP, or CCaaS connectors. Same media pipeline underneath.

Three ways to get voice in and out of Wordcab: (1) programmable-voice providers over WebSocket, (2) native SIP, (3) CCaaS connectors. The chart's SIP gateway sub-chart handles on-prem PBX; the media layer is the same regardless of transport.

Twilio / Telnyx / Plivo media streams

Bridge the provider's media stream into Wordcab's WebSocket endpoint. Typical Twilio setup:

xml
<!-- Twilio voice webhook returns this TwiML -->
<Response>
  <Connect>
    <Stream url="wss://wordcab.apps.example.com/v1/media/twilio?agent_id=agent_abc" />
  </Connect>
</Response>

Wordcab receives μ-law / 8 kHz frames over WebSocket, runs STT → LLM → TTS, and writes audio frames back on the same stream. Reference apps for Twilio, Telnyx, Plivo, and Zoom Phone ship in the chart under examples/telephony/.

Native SIP (on-prem PBX)

For Avaya, Genesys, Cisco UCM, FreeSWITCH, or any SIP trunk, deploy the SIP gateway sub-chart. It speaks standard SIP + RTP / SRTP and sits inside the customer network, registered with the PBX.

bash
helm install wordcab-sip wordcab/sip-gateway \
  --namespace wordcab \
  --set trunk.host=pbx.internal.example.com \
  --set trunk.transport=tls \
  --set trunk.codec=g711a \
  --set trunk.srtp.enabled=true \
  --set agents.defaultId=agent_abc

Codec support

  • G.711 μ-law / A-law (8 kHz) — default telephony.
  • G.722 (16 kHz) — wideband where available.
  • Opus (variable) — for WebRTC / SIP-over-WebSocket.

DTMF and transfer

RFC 2833 DTMF is captured and available to agent tools. Blind and attended transfer (SIP REFER) are supported — expose them as agent tools and the LLM can route calls to a human or a different queue.

CCaaS connectors

PlatformProtocolStatus
Genesys CloudAudioHookSupported — real-time STT, QA, and redaction on live calls.
Five9VoiceStreamSupported — transcripts and QA signals delivered via webhook or Kafka back to Five9 reporting.
NICE CXoneReal-time Audio StreamingOn the roadmap (Q3 2026).
Zoom PhoneRTMS media gatewaySupported — business-communications workflows.

Recordings and retention

When record: true is set on an agent, full-call audio is written to the configured object store. On self-hosted, this is your bucket / MinIO / NFS. Retention is set per deployment — default 30 days on cloud, configurable on self-hosted.

Consent

Recording-consent rules vary by jurisdiction. Two-party-consent states, HIPAA contexts, and regulated industries each have different requirements. Wordcab does not decide this for you — configure the agent's opening line and your retention policy to match your compliance posture.