Telephony & SIP

Three ways to get voice in and out: programmable-voice over WebSocket, native SIP, or CCaaS connectors. Same media pipeline underneath.

Three ways to get voice in and out of Wordcab: (1) programmable-voice providers over WebSocket, (2) native SIP, (3) CCaaS connectors. The chart's SIP gateway sub-chart handles on-prem PBX; the media layer is the same regardless of transport.

Twilio / Telnyx / Plivo media streams

Bridge the provider's media stream into Wordcab's WebSocket endpoint. Typical Twilio setup:

xml
<!-- Twilio voice webhook returns this TwiML -->
<Response>
  <Connect>
    <Stream url="wss://wordcab.apps.example.com/v1/media/twilio?agent_id=agent_abc" />
  </Connect>
</Response>

Wordcab receives μ-law / 8 kHz frames over WebSocket, runs STT → LLM → TTS, and writes audio frames back on the same stream. Reference apps for Twilio, Telnyx, Plivo, and Zoom Phone ship in the chart under examples/telephony/.

Native SIP (on-prem PBX)

For Avaya, Genesys, Cisco UCM, FreeSWITCH, or any SIP trunk, deploy the SIP gateway sub-chart. It speaks standard SIP + RTP / SRTP and sits inside the customer network, registered with the PBX.

bash
helm install wordcab-sip wordcab/sip-gateway \
  --namespace wordcab \
  --set trunk.host=pbx.internal.example.com \
  --set trunk.transport=tls \
  --set trunk.codec=g711a \
  --set trunk.srtp.enabled=true \
  --set agents.defaultId=agent_abc

Codec support

  • G.711 μ-law / A-law (8 kHz), default telephony.
  • G.722 (16 kHz), wideband where available.
  • Opus (variable), for WebRTC / SIP-over-WebSocket.

DTMF and transfer

RFC 2833 DTMF is captured and available to agent tools. Blind and attended transfer (SIP REFER) are supported, expose them as agent tools and the LLM can route calls to a human or a different queue.

CCaaS connectors

PlatformProtocolStatus
Genesys CloudAudioHookSupported, real-time STT, QA, and redaction on live calls.
Five9VoiceStreamSupported, transcripts and QA signals delivered via webhook or Kafka back to Five9 reporting.
NICE CXoneReal-time Audio StreamingOn the roadmap (Q3 2026).
Zoom PhoneRTMS media gatewaySupported, business-communications workflows.

Recordings and retention

When record: true is set on an agent, full-call audio is written to the configured object store. On self-hosted, this is your bucket / MinIO / NFS. Retention is set per deployment, default 30 days on cloud, configurable on self-hosted.

Consent

Recording-consent rules vary by jurisdiction. Two-party-consent states, HIPAA contexts, and regulated industries each have different requirements. Wordcab does not decide this for you, configure the agent's opening line and your retention policy to match your compliance posture.