Summarize Blog With:

In Part 1, we made the business case: infrastructure reliability is what wins enterprise deals.
In Part 2, we went under the hood—WebSocket/VSIP streaming, stereo audio, active-active failover, and CPS controls.

This final installment is the one you’ll bookmark. It’s the operational playbook for scaling a Voice AI deployment from a successful pilot to a production system handling thousands of concurrent calls across India.

Because here’s what nobody tells you in the pitch deck: the hardest part of Voice AI isn’t building the bot. It’s scaling it.

1. The Scaling Cliff: Why Most Voice AI Deployments Stall After the Pilot

The pattern is almost universal. A Voice AI company runs a pilot with an enterprise client—50 concurrent calls, limited geography, controlled conditions. The results are impressive. The client is excited. They say the words every founder wants to hear: “Let’s roll this out nationally.”

And then reality hits.

Going from 50 concurrent calls to 5,000 isn’t a 100x increase in the same thing. It’s a fundamentally different engineering and operational challenge. The bottlenecks multiply across three distinct layers:

  • Telephony capacity: Your pilot ran on a handful of DIDs with modest trunk capacity. A national rollout needs hundreds of numbers across multiple telecom circles, with enough trunk capacity to handle peak-hour traffic without queuing or dropping calls. Traditional telco provisioning for this takes weeks to months.
  • GPU inference throughput: Your pilot’s GPU cluster handled 50 concurrent inference sessions comfortably. At 5,000 concurrent calls, you need 100x the compute—and more importantly, you need your telephony layer to match your GPU capacity precisely. Too much telephony throughput overwhelms inference. Too little wastes GPU capacity you’re paying for.
  • Network transport: At pilot scale, network latency variability is manageable. At national scale, you’re dealing with calls from every Indian telecom circle, each with different latency profiles, congestion patterns, and failure characteristics. A solution that worked perfectly on Jio in Mumbai might behave differently on BSNL in a Tier-3 city.

This is the scaling cliff. And the companies that have successfully navigated it—SquadStack scaling outbound collections nationally, Sarvam deploying across major financial institutions, Skit expanding into multiple regulated verticals—all share one thing in common: they planned for scale from the infrastructure layer up, not from the model layer down.

The bot that works perfectly at 50 concurrent calls and the bot that works perfectly at 5,000 concurrent calls are the same bot. The difference is entirely in the infrastructure beneath it.

2. The Throughput Planning Framework

Before you scale anything, you need to know your numbers. Throughput planning for Voice AI requires aligning three independent capacity constraints: telephony CPS, GPU inference concurrency, and network bandwidth. Get any one of these wrong, and your system either wastes resources or drops calls.

Step 1: Calculate Your Target Concurrency

Start with the end state your enterprise client expects:

CONCURRENCY CALCULATION
Target concurrent calls = Peak hour call volume ÷ 3,600 × Average call duration (seconds)

  • Example (SquadStack-style outbound collections):
    Peak hour volume: 180,000 calls/hour
    Average call duration: 55 seconds
    Target concurrency: (180,000 ÷ 3,600) × 55 = 2,750 concurrent calls
  • Example (Sarvam-style inbound support):
    Peak hour volume: 8,000 calls/hour
    Average call duration: 180 seconds (3 minutes)
    Target concurrency: (8,000 ÷ 3,600) × 180 = 400 concurrent calls

Step 2: Derive Your CPS Requirement

CPS (Calls Per Second) determines how quickly your system can ramp up to target concurrency and sustain it:

CPS CALCULATION
Sustained CPS = Target concurrent calls ÷ Average call duration (seconds)

  • Outbound collections example:
    Sustained CPS: 2,750 ÷ 55 = 50 CPS
    Burst CPS (for campaign ramp-up): 1.5–2x sustained = 75–100 CPS
  • Inbound support example:
    Sustained CPS: 400 ÷ 180 = ~2.3 CPS
    Burst CPS (for traffic spikes): 3–5x sustained = 7–12 CPS

Notice the dramatic difference: outbound collections campaigns need 50 CPS sustained, while inbound support might need only 2–3 CPS. This is why Exotel’s VN-level CPS controls matter—you configure each virtual number’s rate limit independently based on the use case behind it.

Step 3: Match Telephony CPS to GPU Capacity

This is where most scaling failures originate. Your telephony layer and your inference layer must be capacity-matched:

ParameterHow to SizeSafety Margin
GPU concurrent sessionsBenchmark your full inference chain (ASR + LLM + TTS) under load+10–15% above target concurrency
Exotel CPS limitMatch to GPU capacity, not campaign ambitionSet at 90% of GPU max concurrency ÷ avg duration
Trunk capacityTotal DID lines provisioned with Exotel+20% above peak concurrency for failover headroom
Network bandwidthPer-stream audio (stereo ~128kbps) × concurrent calls+25% for protocol overhead and jitter buffers

The critical principle: your Exotel CPS limit should be your safety valve, not your bottleneck. Set it slightly below your GPU capacity so that if call volume surges, Exotel throttles new call initiations before your inference cluster gets overwhelmed—protecting in-flight calls from latency degradation.

This is exactly how SquadStack structures their scaling: Exotel’s CPS controls act as a governor that keeps their GPU utilization in the optimal range, with dynamic adjustment during campaign ramp-ups and wind-downs.

3. The Compliance & Number Strategy Playbook

Indian telecom regulation is a maze, and getting compliance wrong doesn’t just create legal exposure—it can physically shut down your voice operations. Numbers get blacklisted, trunks get suspended, and enterprise clients lose confidence overnight.

Here’s the operational playbook for getting compliance right from day one.

Understanding ULVNO Compliance

Exotel’s ULVNO (Unified License Virtual Network Operator) status is your regulatory foundation. Operationally, it means:

  • Legal call origination and termination: All calls made through Exotel’s infrastructure are compliant with DoT (Department of Telecom) and TRAI regulations by default. Your Voice AI company doesn’t need to obtain its own telecom license.
  • DND registry compliance: Exotel’s platform automatically checks against the NDNC (National Do Not Call) registry for outbound calls, preventing regulatory violations that can result in heavy fines.
  • Call recording mandates: For regulated industries (banking, insurance, financial services), certain calls must be recorded and retained. Exotel’s stereo recording capabilities (covered in Part 2) provide compliant, dual-channel recordings with clear caller/bot separation.

For Skit, whose enterprise clients span heavily regulated sectors like banking and insurance, being able to say “Our infrastructure is ULVNO-compliant with built-in DND checking and compliant call recording” has become a measurable advantage in procurement evaluations. It eliminates an entire category of objections from enterprise compliance teams.

DID Number Strategy: More Than Just Picking Numbers

Your choice of DID (Direct Inward Dialing) numbers has a direct, measurable impact on operational metrics—particularly pickup rates for outbound campaigns. Here’s the framework:

Number SeriesBest ForPickup Rate ImpactConsiderations
Local geographic (e.g., 080, 044)Inbound support, regional campaignsHighest (callers trust local numbers)Need numbers in each target circle
Toll-free (1800)National inbound, brand helplinesHigh (associated with established brands)Higher per-minute costs; callers expect free calls
Non-geographic mobile seriesOutbound campaigns, national coverageModerate to highFlexible deployment; no regional provisioning needed
Vanity / branded seriesBrand identity, marketing campaignsVariable (recognition-dependent)Requires advance planning with Exotel for availability

Exotel’s ULVNO status provides access to any-series DIDs across Indian telecom operators—a flexibility that wrapper-based CPaaS platforms cannot offer. This means your number strategy can be optimized for your specific use case rather than constrained by what’s available through a reseller.

Operational recommendation: work with Exotel’s team to plan your number strategy before scaling, not after. For SquadStack’s outbound collections campaigns, number series selection and rotation strategy is a continuously optimized parameter that directly impacts contact rates and recovery amounts.

4. Failover Validation: Testing Your Safety Net Before You Need It

As we covered in Part 2, Exotel’s active-active architecture and multi-operator redundancy provide automatic failover. But “automatic” doesn’t mean “untested.” Every production Voice AI deployment should validate its failover paths before going live—and periodically afterward.

The Failover Testing Checklist

PRE-PRODUCTION FAILOVER VALIDATION

  • ✓ Operator failover: Simulate primary operator congestion. Verify that calls automatically reroute to secondary operator with < 100ms switchover and no audible interruption.
  • ✓ Region failover: Simulate primary DC unavailability. Confirm that call routing shifts to the DR site with maintained call quality and no dropped in-flight calls.
  • ✓ VN-level overflow: Push traffic above CPS limits on a specific VN. Verify that overflow handling (queue, redirect, or reject) behaves as configured.
  • ✓ GPU saturation response: Simulate GPU cluster at 95% utilization. Confirm that CPS throttling engages correctly and that in-flight calls maintain acceptable latency.
  • ✓ End-to-end failover chain: Trigger multiple simultaneous failures (operator + region). Verify that the system degrades gracefully rather than failing catastrophically.

Monitoring & Alerting: What to Track in Production

Once you’re live, continuous monitoring prevents surprises. Here are the metrics every Voice AI operations team should track on their Exotel infrastructure:

MetricWhat It Tells YouAlert Threshold
CPS utilization (%)How close you are to telephony capacity limits> 80% sustained for 5+ minutes
ASR (Answer Seizure Ratio)Percentage of call attempts that connect successfully< 95% over 15-minute window
Failover events / hourFrequency of automatic path switches> 5 events/hour (investigate carrier health)
Average call setup timeTime from trigger to caller hearing audio> 3 seconds (check trunk capacity)
Concurrent call countReal-time active sessions vs. capacity> 85% of GPU max concurrency
Trunk utilization by operatorLoad distribution across telco partners> 90% on any single operator

Sarvam’s operations team monitors these metrics in real-time during national deployments, with automated CPS adjustment triggers tied to GPU utilization—a pattern that Exotel’s team helped them architect during the embedded engagement phase described in Part 1.

5. The Scaling Roadmap: From Pilot to National Deployment

Based on our experience scaling infrastructure for India’s leading Voice AI companies, here’s the phased approach that consistently delivers successful national-scale deployments.

Phase 1: Foundation (Weeks 1–2)

  • Architecture review: Exotel’s engineering team reviews your current stack—inference pipeline, ASR/TTS providers, hosting setup—and identifies integration points and potential bottlenecks.
  • Capacity planning: Using the throughput framework above, define target concurrency, CPS requirements, and trunk capacity for your first production deployment.
  • Number strategy: Select DID series based on use case, geography, and pickup rate optimization goals.
  • Streaming setup: Configure WebSocket/VSIP endpoints, stereo audio channels, and barge-in handling parameters.

Phase 2: Controlled Production (Weeks 3–4)

  • Limited geography rollout: Deploy to 2–3 telecom circles with moderate traffic volume (10–15% of target concurrency).
  • Baseline metrics: Establish performance baselines for latency, ASR accuracy, call completion rates, and failover behavior.
  • Failover testing: Execute the full failover validation checklist above.
  • CPS tuning: Fine-tune VN-level CPS limits based on observed GPU utilization patterns under real traffic.

Phase 3: Accelerated Scale (Weeks 5–8)

  • Geography expansion: Scale to all target telecom circles. Provision additional DID numbers and trunk capacity as needed.
  • Traffic ramp: Incrementally increase from 15% to 100% of target concurrency, monitoring all metrics at each step.
  • Operator redundancy validation: Confirm multi-operator failover is functioning correctly across all deployed circles.
  • Performance optimization: Work with Exotel’s team to optimize routing logic for regional latency characteristics.

Phase 4: Steady State + Continuous Optimization (Ongoing)

  • Capacity headroom management: Maintain 15–20% headroom above observed peak concurrency for traffic spikes and organic growth.
  • Number rotation: For outbound campaigns, implement DID rotation strategies to maintain pickup rates over time.
  • Quarterly scaling reviews: Regular capacity planning sessions with Exotel’s team to align infrastructure with your growth trajectory.
  • Disaster recovery drills: Scheduled failover tests to ensure backup paths remain validated and operational.

This is the roadmap that took SquadStack from a successful collections pilot to a national-scale operation handling thousands of concurrent calls. The infrastructure scaled with them—not ahead of them, not behind them, but alongside them.

6. Beyond the API Key: The Partnership Model That Scales With You

There’s a reason we keep coming back to the word “partnership.” The scaling playbook above isn’t a self-serve workflow you execute from a dashboard. It’s a joint operation between your team and Exotel’s.

Here’s what the partnership model looks like operationally:

  • Dedicated Customer Success Managers: Not a shared support queue. A named CSM who knows your architecture, your enterprise clients, and your scaling timeline. When Sarvam was preparing for their national rollout, Exotel’s CSMs spent weeks embedded in their office—not because of a crisis, but because that’s the standard engagement model for strategic partnerships.
  • Custom commercial models: Pricing structures designed for million-minute volumes. When you’re processing the kind of call volumes that SquadStack and Skit handle, per-minute pricing models that work at startup scale become prohibitively expensive. Exotel designs commercial models that make unit economics viable at enterprise scale.
  • Joint architecture planning: Exotel’s engineering team works directly with yours to design throughput pipelines for your specific use case. Outbound collections at 5,000 CPS has fundamentally different architectural requirements than inbound support with complex IVR-to-AI handoffs—and the infrastructure should reflect that.
  • Faster provisioning TATs: When an enterprise client says “go national” and gives you a 2-week timeline, you can’t wait 6 weeks for traditional telco number provisioning. Exotel’s turnaround times for DID provisioning, trunk scaling, and capacity increases are built for the speed that AI companies’ enterprise clients expect.

This is what separates a vendor from a partner. Exotel is the largest voice platform in India, second only to the major telcos themselves—with 15 years of deep telecom grid integration. That infrastructure, combined with a partnership model built around your scaling journey, is why over 50% of India’s Voice AI streaming traffic runs on our platform.


Your Turn: From Playbook to Production

You’ve now read the full arc. The business case (Part 1). The technical architecture (Part 2). And the operational playbook right here in Part 3.

If you’re a Voice AI company in India that’s nailed the pilot and is staring at the scaling cliff, here’s what we’d suggest as a next step: bring your growth projections and stack diagram, and let’s build the infrastructure plan together.

Not a sales call. An engineering conversation. The same kind we had with Sarvam, Skit, SquadStack, Vipatra, and Fundamento before they scaled—and the same kind that helped them clear the cliff.

READY TO SCALE?
Exotel’s Voice AI solutions team runs dedicated scaling workshops for AI companies preparing for national-scale deployment. Bring your concurrency targets, your stack diagram, and your timeline—we’ll build the throughput plan, number strategy, and failover architecture together.

→ Schedule a Scaling Workshop with Exotel’s Voice AI Team


THIS IS PART 3 OF A 3-PART SERIES
← Part 1: Why 50% of India’s Voice AI Runs on One Infrastructure Partner
← Part 2: Inside the Stack: How Exotel Architects Zero-Latency Voice AI Pipelines

Certified by HubSpot and Google, I’m a B2B SaaS marketer with 12+ years of experience building scalable marketing engines across content, demand generation, product marketing, and GTM strategy. I’ve helped grow CRM and CX platforms by driving organic growth, improving SQL conversions, and accelerating pipeline across global markets including UAE, KSA, APAC, Africa, and the USA. I believe in human-first messaging, revenue-linked strategy, and building systems that scale.