Telephony & SIP
Three ways to get voice in and out: programmable-voice over WebSocket, native SIP, or CCaaS connectors. Same media pipeline underneath.
Three ways to get voice in and out of Wordcab: (1) programmable-voice providers over WebSocket, (2) native SIP, (3) CCaaS connectors. The chart's SIP gateway sub-chart handles on-prem PBX; the media layer is the same regardless of transport.
Twilio / Telnyx / Plivo media streams
Bridge the provider's media stream into Wordcab's WebSocket endpoint. Typical Twilio setup:
<!-- Twilio voice webhook returns this TwiML -->
<Response>
<Connect>
<Stream url="wss://wordcab.apps.example.com/v1/media/twilio?agent_id=agent_abc" />
</Connect>
</Response>Wordcab receives μ-law / 8 kHz frames over WebSocket, runs STT → LLM → TTS, and writes audio frames back on the same stream. Reference apps for Twilio, Telnyx, Plivo, and Zoom Phone ship in the chart under examples/telephony/.
Native SIP (on-prem PBX)
For Avaya, Genesys, Cisco UCM, FreeSWITCH, or any SIP trunk, deploy the SIP gateway sub-chart. It speaks standard SIP + RTP / SRTP and sits inside the customer network, registered with the PBX.
helm install wordcab-sip wordcab/sip-gateway \
--namespace wordcab \
--set trunk.host=pbx.internal.example.com \
--set trunk.transport=tls \
--set trunk.codec=g711a \
--set trunk.srtp.enabled=true \
--set agents.defaultId=agent_abcCodec support
- G.711 μ-law / A-law (8 kHz), default telephony.
- G.722 (16 kHz), wideband where available.
- Opus (variable), for WebRTC / SIP-over-WebSocket.
DTMF and transfer
RFC 2833 DTMF is captured and available to agent tools. Blind and attended transfer (SIP REFER) are supported, expose them as agent tools and the LLM can route calls to a human or a different queue.
CCaaS connectors
| Platform | Protocol | Status |
|---|---|---|
| Genesys Cloud | AudioHook | Supported, real-time STT, QA, and redaction on live calls. |
| Five9 | VoiceStream | Supported, transcripts and QA signals delivered via webhook or Kafka back to Five9 reporting. |
| NICE CXone | Real-time Audio Streaming | On the roadmap (Q3 2026). |
| Zoom Phone | RTMS media gateway | Supported, business-communications workflows. |
Recordings and retention
When record: true is set on an agent, full-call audio is written to the configured object store. On self-hosted, this is your bucket / MinIO / NFS. Retention is set per deployment, default 30 days on cloud, configurable on self-hosted.
Recording-consent rules vary by jurisdiction. Two-party-consent states, HIPAA contexts, and regulated industries each have different requirements. Wordcab does not decide this for you, configure the agent's opening line and your retention policy to match your compliance posture.