A developer recently open-sourced a self-hosted telephony stack on GitHub, built on Asterisk and AWS Chime, that replicates Twilio’s WebSocket Media Streams interface at zero monthly cost. The post gained immediate traction because it addressed a real frustration: Twilio’s HIPAA compliance package starts at ₹1.68 lakh per month before a single call is made. For startups building AI voice agents in healthcare or fintech, that price tag eliminates voice AI as an option entirely.
But the project’s creator acknowledged something important: “Twilio remains superior for users avoiding infrastructure management.” That single sentence captures the central tension this post examines: when you factor in every cost of running your own telephony infrastructure, does self-hosting actually save money?
The three paths to AI voice telephony in 2026
The AI voice agent market hit ₹20,160 crore in 2024 and is projected to reach ₹3.99 lakh crore by 2034, growing at 34.8% CAGR (Market.us, 2024).
- Developers building voice AI applications now face three distinct deployment paths, each with a different cost structure.
- Full self-hosting: You run Asterisk or FreeSWITCH on your own servers, connect SIP trunks to the PSTN, build WebSocket bridges for AI audio streaming, and manage every layer from media routing to security. The Vectorly open-telephony-stack on GitHub is a production reference for this approach.
- Managed CPaaS: You use a platform that handles telephony infrastructure, carrier connectivity, compliance, and scaling. You interact through APIs and WebSocket interfaces, focusing your engineering effort on the AI agent itself rather than the phone system underneath it.
- Hybrid model: You self-host certain components (the AI agent logic, perhaps a custom LLM) while relying on managed telephony for PSTN connectivity, number management, and regulatory compliance layers. Most production deployments land here when they mature past the prototype stage.
What self-hosting actually requires
A self-hosted voice AI telephony stack isn’t a single piece of software. It’s a collection of interdependent systems, each demanding ongoing attention.
| Component | What it does | Self-hosted responsibility |
|---|---|---|
| SIP server (Asterisk/FreeSWITCH) | Call signalling and routing | Install, configure, patch, scale |
| SIP trunking | PSTN connectivity | Negotiate carrier contracts, manage failover |
| Media server | RTP audio processing | Handle codec transcoding, packet loss, jitter buffers |
| WebSocket bridge | Stream audio to AI models | Build and maintain real-time audio pipeline |
| TLS/SRTP | Call encryption | Certificate management, security audits |
| Number management | DIDs, porting, CLI compliance | Carrier relationships per geography |
| Monitoring | Uptime, call quality, fraud detection | Build or integrate observability stack |
Each row above represents a team or at least a dedicated engineer. Asterisk’s monolithic, single-threaded architecture handles about 4,500 concurrent calls under optimal conditions on multi-core hardware (VitalPBX, 2025). FreeSWITCH handles 2,000+ concurrent calls, but these numbers assume expert tuning. Production systems frequently fail at just 10 to 30 concurrent calls because of issues like GPU memory fragmentation, connection pool starvation, or WebSocket backpressure (Sachin Keshav, Medium, 2025).
SIP ALG (Application Layer Gateway), present on 70% of routers by default, rewrites SIP headers and breaks Asterisk deployments. Version upgrades carry __significant risk__: Asterisk 21 removed app_macro entirely, so any dialplan using Macro() calls will silently hang up after migration (AsteriskService, 2025).
The real cost breakdown: self-hosted vs managed
The appeal of self-hosting is the infrastructure line item. An AWS EC2 m5.large instance for Asterisk costs approximately ₹5,900 per month. SIP trunks run ₹0.42 to ₹1.68 per minute. Local DID numbers cost ₹84 to ₹252 per month. At first glance, this looks dramatically cheaper than managed CPaaS pricing.
But infrastructure is a fraction of total cost of ownership. Here’s a realistic three-year TCO comparison for an organisation processing 50,000 calls per month:
| Cost category | Self-hosted (3-year) | Managed CPaaS (3-year) |
|---|---|---|
| Infrastructure (servers, storage) | ₹6.3 lakh | Included in per-minute pricing |
| SIP trunking and DIDs | ₹15.1 lakh | Included |
| DevOps engineer (1 FTE, India) | ₹54 lakh | Not required |
| Security and patching | ₹4.2 lakh | Included |
| SOC 2 Type II audit (annual) | ₹50.4 lakh | Provider-certified |
| Compliance management (TRAI DLT, DPDP) | ₹12.6 lakh | Platform-managed |
| Monitoring and incident response | ₹8.4 lakh | Included (SLA-backed) |
| Downtime cost (est. 2 incidents/year) | ₹25.2 lakh | SLA credits apply |
| Total | ₹1.76 crore | Typically ₹45–90 lakh |
The self-hosted figures above are estimated based on publicly available salary data for India-based DevOps engineers, SOC 2 audit cost ranges from Scrut Automation (2025) and Secureframe (2025), and AWS EC2 pricing. Actual costs vary by organisation size, call volume, and infrastructure choices.
The largest hidden cost is staffing. A production telephony system requires 24/7 availability. When Asterisk crashes at 2 AM on a Sunday, someone needs to diagnose whether it is a SIP registration failure, a codec mismatch, or a carrier-side issue. That operational burden does not appear on any AWS bill.
India’s regulatory layer adds a compliance multiplier
For organisations operating voice AI in India, self-hosting introduces regulatory complexity that compounds the cost gap.
- TRAI’s DLT (Distributed Ledger Technology) compliance is mandatory for all commercial voice calls. Every promotional call must be registered and tracked. As of August 2024, TRAI directed telecom operators to immediately disconnect unregistered telemarketing operations (2factor.in, 2025). Self-hosting means you manage your own DLT registration, DND list checking, and call timing restrictions. Managed providers handle this as part of the platform.
- The Digital Personal Data Protection (DPDP) Act (partial enforcement as of November 13, 2025, full effect by May 2027) makes organisations directly responsible as Data Fiduciaries for all personal data in their possession (EY India, 2025). Self-hosted voice AI systems that process customer conversations fall under mandatory breach reporting, data minimisation requirements, and security protocols. Managed providers absorb this compliance burden.
- UL-VNO license requirement adds another dimension. Providing voice telephony services in India requires a Virtual Network Operator license with a minimum net worth of ₹10 crore (Department of Telecom). Not a concern for organisations using a licensed CPaaS provider, but relevant for those considering deep self-hosted deployments.
- PCI DSS scope extends to VoIP traffic containing payment card data—call recordings must comply with PCI DSS 3.2.1 prohibitions on storing Sensitive Authentication Data. Self-hosted systems require independent PCI audit; managed platforms carry their own certification.
Latency: where self-hosting appears to win but often loses
Self-hosting promises lower latency because you control the network path. In theory, a well-tuned FreeSWITCH instance sitting close to your AI inference server eliminates extra hops. In practice, the latency advantage disappears for most organisations.
| Metric | Self-hosted (optimised) | Managed CPaaS (India-native) |
|---|---|---|
| SIP INVITE to ring | 1 to 3 seconds | 1 to 2 seconds |
| Media server round-trip | Variable (depends on deployment) | 10 to 25 ms (Exotel India PoP) |
| End-to-end voice AI turn | 800 to 1,400 ms (typical) | 400 to 650 ms (with AgentStream) |
The counter-intuitive result: a managed platform with data centres in the same geography as your callers often delivers lower latency than a self-hosted stack running on a general-purpose cloud region. Exotel’s AgentStream architecture achieves sub-20ms media streaming latency by eliminating the SIP/SBC middleware layer entirely, replacing it with direct WebSocket integration to the media pipeline.
Codec transcoding is the hidden latency tax in self-hosted deployments. When your SIP trunk delivers G.711 audio but your AI model expects PCM or Opus, Asterisk must transcode every audio frame. Under load, this becomes a CPU bottleneck that degrades voice quality and adds processing delay.
Scaling: the moment self-hosting breaks
Self-hosted telephony works at small scale. Economics change dramatically as call volumes grow.
- Asterisk’s process-per-channel architecture means each call consumes dedicated resources. When database and PBX sit on the same server, both compete during traffic spikes.
- Cloud-native managed platforms scale from 100 to 10,000+ concurrent calls through isolated microservices with automated failover, maintaining 99.99% uptime under extreme loads (Deepgram, 2026).
- Failover visibility: A self-hosted Asterisk instance is a single point of failure unless you invest in active-passive clustering, geographic redundancy, and automated health checks. Each layer of resilience adds cost and operational complexity. A managed platform provides this as a default.
- India-specific scaling: Carrier relationship management, number provisioning, and carrier-specific routing rules are operational burdens that scale with geographic coverage.
When self-hosting makes sense
Self-hosting isn’t universally wrong. It fits specific circumstances where the trade-offs align with organizational capabilities:
- You have dedicated VoIP engineering talent (not just general backend developers, but specialists in SIP, RTP, and media processing).
- You need deep customisation of the telephony layer itself, not just the AI agent logic.
- Compliance requirements demand you fully control and audit your infrastructure.
- Call volumes are predictable and do not require dynamic scaling.
The Vectorly open-telephony-stack on GitHub is a well-engineered reference implementation for this path—Asterisk PBX with AWS Chime SIP connectivity, a FastAPI WebSocket shim mimicking Twilio’s Media Streams, and a sample voice agent server for OpenAI Realtime API integration. For a skilled team, it reduces time-to-prototype significantly.
But “time-to-prototype” and “time-to-production” are different. Production means handling SIP registration failures at scale, managing certificate renewals, patching security vulnerabilities in Asterisk within hours, and maintaining uptime SLAs.
The managed alternative: pay for outcomes, not infrastructure
The managed CPaaS model shifts your cost structure from capital expenditure and headcount to per-minute operational expenditure. You pay for calls made, not servers maintained.
Exotel’s approach to AI voice telephony illustrates this model: the platform handles PSTN connectivity across 22 telecom circles in India, DLT compliance, number provisioning, media processing, and WebSocket-based audio streaming to AI agents. AgentStream delivers sub-20ms media latency with direct WebSocket integration, avoiding the SIP/SBC middleware that adds latency in self-hosted architectures.
For businesses looking to simplify the deployment of virtual agents, adopting an AI-powered contact center can help realize these benefits, consolidating infrastructure and regulatory management under one robust interface. For an organisation processing 50,000+ monthly calls, the managed model typically reduces voice AI total cost of ownership by 30 to 60% compared to self-hosting when all cost categories are included (Exotel, 2026).
The savings come primarily from eliminated headcount (no dedicated VoIP engineer), eliminated compliance overhead (platform-managed DLT, DPDP, PCI DSS), and eliminated infrastructure management (no server patching, no capacity planning, no incident response at 2 AM).
The integration timeline difference is equally significant. CPaaS API integration takes hours to days. Self-hosted Asterisk deployment with SIP trunk configuration, security hardening, and production readiness takes weeks to months (VoiceInfra, 2025; Deepgram, 2026).
Making the decision: a framework
The choice between self-hosted and managed isn’t about which is cheaper in isolation. It’s about which deployment model aligns with where your engineering team should spend its time.
| Decision factor | Favours self-hosted | Favours managed |
|---|---|---|
| VoIP engineering talent | In-house specialists available | General software engineers |
| Call volume pattern | Predictable, steady state | Variable, growing |
| Compliance scope | Single jurisdiction, internal audit | Multi-jurisdiction, enterprise customers |
| Time to production | Months acceptable | Weeks or less required |
| Customisation need | The telephony layer itself | AI agent logic on top of telephony |
| Budget structure | CapEx-friendly, headcount available | OpEx-preferred, lean team |
For most organisations building AI voice agents in India and APAC, the managed path delivers faster time to production, lower total cost of ownership, and better operational resilience. Self-hosting is a valid choice for teams with deep VoIP expertise and specific infrastructure control requirements, but it is rarely the economical choice once all costs are counted.
Sources
- Market.us, “Voice AI Agents Market,” 2024
- Scrut Automation, “SOC 2 Compliance Cost,” 2025
- Secureframe, “SOC 2 Audit Cost,” 2025
- 2factor.in, “TRAI Mandatory DLT Registration Guide,” 2025
- EY India, “Decoding the DPDP Act 2023,” 2025
- Department of Telecom India, “UL (VNO) NLD Service”
- VitalPBX, “Asterisk PBX Multi-Core Test,” 2025
- AsteriskService, “Why Asterisk Version Upgrades Fail,” 2025
- KingAsterik, “Asterisk Internal Call Failures,” 2025
- Sachin Keshav, “Real-World Challenges of Voice AI,” Medium, 2025
- VoiceInfra, “CPaaS Voice AI Integration Guide,” 2025
- Deepgram, “Scalable Voice AI Platforms,” 2026
- Mordor Intelligence, “CPaaS Market Report,” 2025
- GitHub VectorlyApp/open-telephony-stack, 2026




