Self-hosted vs managed: total cost of ownership for AI voice telephony

Shiva Tripathi

View Author Profile

AI & Solutions

March 26, 2026

Summarize blog with

A developer recently open-sourced a self-hosted telephony stack on GitHub, built on Asterisk and AWS Chime, that replicates Twilio’s WebSocket Media Streams interface at zero monthly cost. The post gained immediate traction because it addressed a real frustration: Twilio’s HIPAA compliance package starts at ₹1.68 lakh per month before a single call is made. For startups building AI voice agents in healthcare or fintech, that price tag eliminates voice AI as an option entirely.

But the project’s creator acknowledged something important: “Twilio remains superior for users avoiding infrastructure management.” That single sentence captures the central tension this post examines: when you factor in every cost of running your own telephony infrastructure, does self-hosting actually save money?

The three paths to AI voice telephony in 2026

The AI voice agent market hit ₹20,160 crore in 2024 and is projected to reach ₹3.99 lakh crore by 2034, growing at 34.8% CAGR (Market.us, 2024).

Developers building voice AI applications now face three distinct deployment paths, each with a different cost structure.
Full self-hosting: You run Asterisk or FreeSWITCH on your own servers, connect SIP trunks to the PSTN, build WebSocket bridges for AI audio streaming, and manage every layer from media routing to security. The Vectorly open-telephony-stack on GitHub is a production reference for this approach.
Managed CPaaS: You use a platform that handles telephony infrastructure, carrier connectivity, compliance, and scaling. You interact through APIs and WebSocket interfaces, focusing your engineering effort on the AI agent itself rather than the phone system underneath it.
Hybrid model: You self-host certain components (the AI agent logic, perhaps a custom LLM) while relying on managed telephony for PSTN connectivity, number management, and regulatory compliance layers. Most production deployments land here when they mature past the prototype stage.

What self-hosting actually requires

A self-hosted voice AI telephony stack isn’t a single piece of software. It’s a collection of interdependent systems, each demanding ongoing attention.

Component	What it does	Self-hosted responsibility
SIP server (Asterisk/FreeSWITCH)	Call signalling and routing	Install, configure, patch, scale
SIP trunking	PSTN connectivity	Negotiate carrier contracts, manage failover
Media server	RTP audio processing	Handle codec transcoding, packet loss, jitter buffers
WebSocket bridge	Stream audio to AI models	Build and maintain real-time audio pipeline
TLS/SRTP	Call encryption	Certificate management, security audits
Number management	DIDs, porting, CLI compliance	Carrier relationships per geography
Monitoring	Uptime, call quality, fraud detection	Build or integrate observability stack

Each row above represents a team or at least a dedicated engineer. Asterisk’s monolithic, single-threaded architecture handles about 4,500 concurrent calls under optimal conditions on multi-core hardware (VitalPBX, 2025). FreeSWITCH handles 2,000+ concurrent calls, but these numbers assume expert tuning. Production systems frequently fail at just 10 to 30 concurrent calls because of issues like GPU memory fragmentation, connection pool starvation, or WebSocket backpressure (Sachin Keshav, Medium, 2025).

SIP ALG (Application Layer Gateway), present on 70% of routers by default, rewrites SIP headers and breaks Asterisk deployments. Version upgrades carry __significant risk__: Asterisk 21 removed app_macro entirely, so any dialplan using Macro() calls will silently hang up after migration (AsteriskService, 2025).

The real cost breakdown: self-hosted vs managed

The appeal of self-hosting is the infrastructure line item. An AWS EC2 m5.large instance for Asterisk costs approximately ₹5,900 per month. SIP trunks run ₹0.42 to ₹1.68 per minute. Local DID numbers cost ₹84 to ₹252 per month. At first glance, this looks dramatically cheaper than managed CPaaS pricing.

But infrastructure is a fraction of total cost of ownership. Here’s a realistic three-year TCO comparison for an organisation processing 50,000 calls per month:

Cost category	Self-hosted (3-year)	Managed CPaaS (3-year)
Infrastructure (servers, storage)	₹6.3 lakh	Included in per-minute pricing
SIP trunking and DIDs	₹15.1 lakh	Included
DevOps engineer (1 FTE, India)	₹54 lakh	Not required
Security and patching	₹4.2 lakh	Included
SOC 2 Type II audit (annual)	₹50.4 lakh	Provider-certified
Compliance management (TRAI DLT, DPDP)	₹12.6 lakh	Platform-managed
Monitoring and incident response	₹8.4 lakh	Included (SLA-backed)
Downtime cost (est. 2 incidents/year)	₹25.2 lakh	SLA credits apply
Total	₹1.76 crore	Typically ₹45–90 lakh

The self-hosted figures above are estimated based on publicly available salary data for India-based DevOps engineers, SOC 2 audit cost ranges from Scrut Automation (2025) and Secureframe (2025), and AWS EC2 pricing. Actual costs vary by organisation size, call volume, and infrastructure choices.

The largest hidden cost is staffing. A production telephony system requires 24/7 availability. When Asterisk crashes at 2 AM on a Sunday, someone needs to diagnose whether it is a SIP registration failure, a codec mismatch, or a carrier-side issue. That operational burden does not appear on any AWS bill.

India’s regulatory layer adds a compliance multiplier

For organisations operating voice AI in India, self-hosting introduces regulatory complexity that compounds the cost gap.

TRAI’s DLT (Distributed Ledger Technology) compliance is mandatory for all commercial voice calls. Every promotional call must be registered and tracked. As of August 2024, TRAI directed telecom operators to immediately disconnect unregistered telemarketing operations (2factor.in, 2025). Self-hosting means you manage your own DLT registration, DND list checking, and call timing restrictions. Managed providers handle this as part of the platform.
The Digital Personal Data Protection (DPDP) Act (partial enforcement as of November 13, 2025, full effect by May 2027) makes organisations directly responsible as Data Fiduciaries for all personal data in their possession (EY India, 2025). Self-hosted voice AI systems that process customer conversations fall under mandatory breach reporting, data minimisation requirements, and security protocols. Managed providers absorb this compliance burden.
UL-VNO license requirement adds another dimension. Providing voice telephony services in India requires a Virtual Network Operator license with a minimum net worth of ₹10 crore (Department of Telecom). Not a concern for organisations using a licensed CPaaS provider, but relevant for those considering deep self-hosted deployments.
PCI DSS scope extends to VoIP traffic containing payment card data—call recordings must comply with PCI DSS 3.2.1 prohibitions on storing Sensitive Authentication Data. Self-hosted systems require independent PCI audit; managed platforms carry their own certification.

Latency: where self-hosting appears to win but often loses

Self-hosting promises lower latency because you control the network path. In theory, a well-tuned FreeSWITCH instance sitting close to your AI inference server eliminates extra hops. In practice, the latency advantage disappears for most organisations.

Metric	Self-hosted (optimised)	Managed CPaaS (India-native)
SIP INVITE to ring	1 to 3 seconds	1 to 2 seconds
Media server round-trip	Variable (depends on deployment)	10 to 25 ms (Exotel India PoP)
End-to-end voice AI turn	800 to 1,400 ms (typical)	400 to 650 ms (with AgentStream)

The counter-intuitive result: a managed platform with data centres in the same geography as your callers often delivers lower latency than a self-hosted stack running on a general-purpose cloud region. Exotel’s AgentStream architecture achieves sub-20ms media streaming latency by eliminating the SIP/SBC middleware layer entirely, replacing it with direct WebSocket integration to the media pipeline.

Codec transcoding is the hidden latency tax in self-hosted deployments. When your SIP trunk delivers G.711 audio but your AI model expects PCM or Opus, Asterisk must transcode every audio frame. Under load, this becomes a CPU bottleneck that degrades voice quality and adds processing delay.

Scaling: the moment self-hosting breaks

Self-hosted telephony works at small scale. Economics change dramatically as call volumes grow.

Asterisk’s process-per-channel architecture means each call consumes dedicated resources. When database and PBX sit on the same server, both compete during traffic spikes.
Cloud-native managed platforms scale from 100 to 10,000+ concurrent calls through isolated microservices with automated failover, maintaining 99.99% uptime under extreme loads (Deepgram, 2026).
Failover visibility: A self-hosted Asterisk instance is a single point of failure unless you invest in active-passive clustering, geographic redundancy, and automated health checks. Each layer of resilience adds cost and operational complexity. A managed platform provides this as a default.
India-specific scaling: Carrier relationship management, number provisioning, and carrier-specific routing rules are operational burdens that scale with geographic coverage.

When self-hosting makes sense

Self-hosting isn’t universally wrong. It fits specific circumstances where the trade-offs align with organizational capabilities:

You have dedicated VoIP engineering talent (not just general backend developers, but specialists in SIP, RTP, and media processing).
You need deep customisation of the telephony layer itself, not just the AI agent logic.
Compliance requirements demand you fully control and audit your infrastructure.
Call volumes are predictable and do not require dynamic scaling.

The Vectorly open-telephony-stack on GitHub is a well-engineered reference implementation for this path—Asterisk PBX with AWS Chime SIP connectivity, a FastAPI WebSocket shim mimicking Twilio’s Media Streams, and a sample voice agent server for OpenAI Realtime API integration. For a skilled team, it reduces time-to-prototype significantly.

But “time-to-prototype” and “time-to-production” are different. Production means handling SIP registration failures at scale, managing certificate renewals, patching security vulnerabilities in Asterisk within hours, and maintaining uptime SLAs.

The managed alternative: pay for outcomes, not infrastructure

The managed CPaaS model shifts your cost structure from capital expenditure and headcount to per-minute operational expenditure. You pay for calls made, not servers maintained.

Exotel’s approach to AI voice telephony illustrates this model: the platform handles PSTN connectivity across 22 telecom circles in India, DLT compliance, number provisioning, media processing, and WebSocket-based audio streaming to AI agents. AgentStream delivers sub-20ms media latency with direct WebSocket integration, avoiding the SIP/SBC middleware that adds latency in self-hosted architectures.

For businesses looking to simplify the deployment of virtual agents, adopting an AI-powered contact center can help realize these benefits, consolidating infrastructure and regulatory management under one robust interface. For an organisation processing 50,000+ monthly calls, the managed model typically reduces voice AI total cost of ownership by 30 to 60% compared to self-hosting when all cost categories are included (Exotel, 2026).

The savings come primarily from eliminated headcount (no dedicated VoIP engineer), eliminated compliance overhead (platform-managed DLT, DPDP, PCI DSS), and eliminated infrastructure management (no server patching, no capacity planning, no incident response at 2 AM).

The integration timeline difference is equally significant. CPaaS API integration takes hours to days. Self-hosted Asterisk deployment with SIP trunk configuration, security hardening, and production readiness takes weeks to months (VoiceInfra, 2025; Deepgram, 2026).

Making the decision: a framework

The choice between self-hosted and managed isn’t about which is cheaper in isolation. It’s about which deployment model aligns with where your engineering team should spend its time.

Decision factor	Favours self-hosted	Favours managed
VoIP engineering talent	In-house specialists available	General software engineers
Call volume pattern	Predictable, steady state	Variable, growing
Compliance scope	Single jurisdiction, internal audit	Multi-jurisdiction, enterprise customers
Time to production	Months acceptable	Weeks or less required
Customisation need	The telephony layer itself	AI agent logic on top of telephony
Budget structure	CapEx-friendly, headcount available	OpEx-preferred, lean team

For most organisations building AI voice agents in India and APAC, the managed path delivers faster time to production, lower total cost of ownership, and better operational resilience. Self-hosting is a valid choice for teams with deep VoIP expertise and specific infrastructure control requirements, but it is rarely the economical choice once all costs are counted.

Sources

Market.us, “Voice AI Agents Market,” 2024
Scrut Automation, “SOC 2 Compliance Cost,” 2025
Secureframe, “SOC 2 Audit Cost,” 2025
2factor.in, “TRAI Mandatory DLT Registration Guide,” 2025
EY India, “Decoding the DPDP Act 2023,” 2025
Department of Telecom India, “UL (VNO) NLD Service”
VitalPBX, “Asterisk PBX Multi-Core Test,” 2025
AsteriskService, “Why Asterisk Version Upgrades Fail,” 2025
KingAsterik, “Asterisk Internal Call Failures,” 2025
Sachin Keshav, “Real-World Challenges of Voice AI,” Medium, 2025
VoiceInfra, “CPaaS Voice AI Integration Guide,” 2025
Deepgram, “Scalable Voice AI Platforms,” 2026
Mordor Intelligence, “CPaaS Market Report,” 2025
GitHub VectorlyApp/open-telephony-stack, 2026

Found this interesting? Share it now!

Revolutionize Customer Experience

Discover strategies to enhance customer satisfaction with cutting-edge tools.

Request Demo

Shiva Tripathi

Shiva is Head of Digital Marketing & Developer Network at Exotel, a growing community of builders working with voice, messaging, and AI-powered communication APIs. He has spent 13+ years helping B2B SaaS companies grow through data-driven marketing, and today he's equally focused on helping developers discover, adopt, and get more out of Exotel's platform. He writes about developer ecosystems, voice AI trends, and what it takes to build great CX infrastructure.

PAN & KYC Verification Over Voice: Secure Architecture for Indian Banks

Human in the Loop, Not Out of the Loop: The Agent-Monitored AI Contact Center