As conversational AI shifts from chat-first to voice-first interfaces, platforms like Pipecat need robust, low-latency, and telco-grade infrastructure to carry audio between customers and bots in real time. This is where Exotel’s AgentStream Infrastructure comes in, delivering real-time audio from PSTN/SIP to your bot with guaranteed reliability and enterprise-grade uptime.

The exotel.py serializer in Pipecat is a production-grade module that enables developers to consume and send Exotel WebSocket-based audio in real time, handle DTMF events, and orchestrate voice sessions for scalable AI conversations.

What is This File?

This file exotel.py implements a custom FrameSerializer that bridges Exotel’s real-time media stream events with Pipecat’s internal audio frame architecture. It supports:

  • Media Event Deserialization (Exotel → Pipecat)
  • Audio Resampling to match sample rate between bot and Exotel (default: 8000 Hz; supports 16kHz/24kHz upsampling/downsampling)
  • Clear Event Handling (StartInterruptionFrame → {event: clear})
  • Media Event Serialization (Pipecat → Exotel)
  • DTMF Deserialization

Supported Event Flow (Exotel ↔ Pipecat)

Exotel Event Pipecat Frame Direction Notes
start StartFrame Exotel → Bot Implicitly handled via setup
media InputAudioRawFrame Exotel → Bot Uses PCM payload in base64
dtmf InputDTMFFrame Exotel → Bot Decodes digit to KeypadEntry
StartInterruption clear event JSON Bot → Exotel Tells Exotel to clear context
AudioRawFrame media event JSON Bot → Exotel Audio stream to customer
TransportMessage* JSON message passthrough Bot → Exotel For metadata/custom routing

How This Works

When integrated into your voicebot runtime:

  • Incoming Events: The WebSocket handler receives JSON packets from Exotel, such as {event: media, …}. These are deserialized to Pipecat-native frames.
  • Outgoing Frames: When the bot responds with AudioRawFrame, the serializer resamples audio and wraps it into an Exotel-compatible media event.
  • Call Termination: StartInterruptionFrame (e.g., triggered on no intent match or disconnect) is translated into a clear event to gracefully close the Exotel stream.

Inbound AgentStream Setup (Customer → Bot)

Inbound AgentStream Setup (Customer → Bot)

→ Customer → Exophone (Exotel Number)
→ SIP/PSTN Infra

→ VoiceBot Applet with WSS endpoint
→ Your Bot / LLM (via Pipecat)

📘 Reference: Working with Stream and Voicebot Applet

Outbound AgentStream Setup (Bot → Customer)

Inbound AgentStream Setup (Customer → Bot)

→ Exotel Campaigns / API
→ Initiates Leg 1 to Customer
→ VoiceBot Applet initiates Leg 2 to Bot (WSS)
→ Bidirectional Audio Flow over WSS
→ Bot streams responses

📘 Reference: Connect API and AgentStream Services

Best Practices for Real-Time Bots

Inbound AgentStream Setup (Customer → Bot)

1. Clear Event Handling

Ensure that your bot sends StartInterruptionFrame (mapped to {event: clear}) when it needs to reset or exit the stream, e.g., after hang-up or fallback.

2. Audio Buffering

Implement frame-level buffering before responding with AudioRawFrame to avoid partial audio or glitches. Suggested buffer duration: 200–300ms.

3. DTMF Support

The deserializer maps digits into InputDTMFFrame. Ensure you map these to correct bot intent or context switching flows.

4. Resampling Optimization

Exotel streams audio at 16kHz (PCM). You can resample up to 24kHz or down to 8kHz depending on your ASR/TTS backend using Pipecat’s create_stream_resampler() method. This ensures audio fidelity.

5. Event Logging & Diagnostics

Log each WebSocket event, audio payload sizes, round-trip latency, and stream SIDs. Use structured logs to trace real-time performance and reliability.

6. Mark Event Handling

Though not always used, your bot can implement logic to handle mark events (if supported), which act as checkpoints for actions like interruptions, confirmations, or analytics tagging.

7. Backpressure and Timeout Handling

Ensure your bot server handles flow control (backpressure) using asyncio.Queue or non-blocking buffers to avoid socket timeout or audio lags.

TL;DR: Why This Matters

Building production-grade voicebots means going beyond basic transcription. You need:

  • Reliable audio ingress from Telco infra
  • Real-time streaming to/from your bot
  • Precise control over when to listen, speak, or reset
  • Seamless fallback/escalation

This Pipecat Exotel serializer helps bridge that gap—letting you plug into India’s most enterprise-grade voice infra while using your own AI stack (LLMs, ASRs, or NLU engines).

Start Using This Today

Supported Use Cases

  • Lead Qualification Bots (click-to-call + bot driven)
  • Inbound IVR Automation (customer dials your number, bot handles intent)
  • Outbound Campaign Automation (Exotel Campaigns + VoiceBot Applet)
  • Collections & Reminders Bots
  • Support Deflection with Agent Escalation

Next Steps

  1. Clone Pipecat
  2. Add your WebSocket URL in Exotel’s Voicebot Applet
  3. Implement custom frame handlers for Start → Media → DTMF → Stop
  4. Use the exotel.py serializer in your bot runtime
  5. Monitor session logs & test with both inbound and outbound flows

📘 Explore:

 

Saurabh Sharma

Saurabh Sharma is a seasoned professional currently serving in the Solution Strategy team at Exotel, where he specializes in sales automation and service fulfilment use cases across diverse industries such as BFSI, internet, logistics, e-commerce, retail, and healthcare. With over a decade of experience, Saurabh has honed his expertise in optimizing business processes and leveraging emerging technologies to drive efficiency and growth. He is particularly passionate about harnessing the potential of generative AI to revolutionize traditional business models and enhance customer experiences. Saurabh is committed to staying at the forefront of industry innovation, delivering strategic insights and solutions that propel organizations towards success.

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.