Telephony & SIP
Three ways to get voice in and out: programmable-voice over WebSocket, native SIP, or CCaaS connectors. Same media pipeline underneath.
Three ways to get voice in and out of Wordcab: (1) programmable-voice providers over WebSocket, (2) native SIP, (3) CCaaS connectors. The chart's SIP gateway sub-chart handles on-prem PBX; the media layer is the same regardless of transport.
Twilio / Telnyx / Plivo media streams
Bridge the provider's media stream into Wordcab's WebSocket endpoint. Typical Twilio setup:
<!-- Twilio voice webhook returns this TwiML -->
<Response>
<Connect>
<Stream url="wss://wordcab.apps.example.com/v1/media/twilio?agent_id=agent_abc" />
</Connect>
</Response>Wordcab receives μ-law / 8 kHz frames over WebSocket, runs STT → LLM → TTS, and writes audio frames back on the same stream. Reference apps for Twilio, Telnyx, Plivo, and Zoom Phone ship in the chart under examples/telephony/.
Native SIP (on-prem PBX)
For Avaya, Genesys, Cisco UCM, FreeSWITCH, or any SIP trunk, deploy the SIP gateway sub-chart. It speaks standard SIP + RTP / SRTP and sits inside the customer network, registered with the PBX.
helm install wordcab-sip wordcab/sip-gateway \
--namespace wordcab \
--set trunk.host=pbx.internal.example.com \
--set trunk.transport=tls \
--set trunk.codec=g711a \
--set trunk.srtp.enabled=true \
--set agents.defaultId=agent_abcCodec support
- G.711 μ-law / A-law (8 kHz) — default telephony.
- G.722 (16 kHz) — wideband where available.
- Opus (variable) — for WebRTC / SIP-over-WebSocket.
DTMF and transfer
RFC 2833 DTMF is captured and available to agent tools. Blind and attended transfer (SIP REFER) are supported — expose them as agent tools and the LLM can route calls to a human or a different queue.
CCaaS connectors
| Platform | Protocol | Status |
|---|---|---|
| Genesys Cloud | AudioHook | Supported — real-time STT, QA, and redaction on live calls. |
| Five9 | VoiceStream | Supported — transcripts and QA signals delivered via webhook or Kafka back to Five9 reporting. |
| NICE CXone | Real-time Audio Streaming | On the roadmap (Q3 2026). |
| Zoom Phone | RTMS media gateway | Supported — business-communications workflows. |
Recordings and retention
When record: true is set on an agent, full-call audio is written to the configured object store. On self-hosted, this is your bucket / MinIO / NFS. Retention is set per deployment — default 30 days on cloud, configurable on self-hosted.
Recording-consent rules vary by jurisdiction. Two-party-consent states, HIPAA contexts, and regulated industries each have different requirements. Wordcab does not decide this for you — configure the agent's opening line and your retention policy to match your compliance posture.