BFSI Voicebot Reliability Checklist

Summarize Blog With:

A voicebot that performs in a 100-seat pilot will fail differently at 50,000 concurrent calls during a credit card campaign peak. The failure modes are predictable: latency spikes above 300ms cause 5% of customers to hang up, intent models that handle 80–90% of requests in testing create handoff friction with the remaining 10–20% at scale, and queue depth thresholds that seemed comfortable at 500 calls per second become bottlenecks at 5,000.

This checklist gives BFSI technology leaders 45 structured questions to stress-test any voice AI vendor’s concurrency architecture, geographic failover design, and outbound campaign A/B testing capabilities before signing a contract. Each question includes a scoring rubric so that your evaluation team produces a quantified comparison rather than a subjective impression.

How to use this checklist

Score each question on a 0–3 scale:

0 = No capability. The vendor does not offer this or cannot demonstrate it.
1 = Partial. The vendor claims the capability but cannot provide documentation, benchmarks, or a live demo.
2 = Demonstrated. The vendor provides third-party benchmarks, architecture documentation, or a live demo.
3 = Proven in production. The vendor provides references from Indian banks or NBFCs operating at comparable scale.

A vendor scoring below 90 out of 135 (67%) across all 45 questions presents material risk for national-scale deployment. Below 60 (44%) is a disqualifier.

Section 1: Concurrency architecture (questions 1–8)

These eight questions determine whether the platform will survive your busiest day, not your average day.

What is your documented peak concurrent call capacity, and who tested it?
Ask for third-party load test results, not internal claims. Production-grade platforms handle 30,000+ concurrent calls per second. Exotel’s AgentStream architecture processes 20+ million conversations monthly across pan-India telecom infrastructure.
What happens to call quality when you hit 80% of peak capacity?
Measure P50, P95, and P99 latency at 80% load. A sub-200 ms P50 is the baseline for natural conversation. P99 above 300ms means 1% of your customers experience awkward pauses during peak campaigns.
How does your platform scale from 1,000 to 50,000 concurrent calls?
Ask whether scaling is automatic or requires manual provisioning. Ask for proof of linear scaling (not theoretical architecture diagrams).
What is your queue depth behaviour under sustained load?
At 15 calls per second (CPS), peak queuing exceeds 80 seconds. At 25 CPS, queue time drops below one second. Ask for queue depth curves at your expected peak volume.
How do you handle burst traffic during campaign launches?
Credit card activation campaigns generate predictable bursts. A platform that pre-provisions capacity based on campaign schedules handles bursts differently than one that relies on reactive auto-scaling.
What is your calls-per-second (CPS) rate limit, and is it configurable?
Some platforms cap CPS at the account level. If your campaign targets 100,000 customers in a two-hour window, you need at least 14 CPS sustained. Confirm that this rate is achievable on your specific number allocation.
Does concurrency scale per number or per account?
Exotel offers unlimited concurrent calls on a single number without channel capacity limits. Some vendors cap concurrency per number, forcing you to provision multiple numbers and manage routing across them.
How do you isolate campaign traffic from inbound support traffic?
A national bank running outbound loan campaigns and inbound customer support on the same platform needs traffic isolation. Ask whether the platform provides separate capacity pools or whether a campaign spike degrades inbound response times.

Section 1 maximum score: 24 points. Target: 18+

Section 2: Geographic failover and disaster recovery (questions 9–16)

RBI’s Master Direction on IT Governance (effective April 2024) requires board-approved BCP and DR policies with half-yearly DR drills for critical systems. These questions verify that your vendor meets those requirements.

How many geographically separated data centres host your voice infrastructure in India?
RBI requires that the configuration at primary and DR sites be identical. Ask for data centre locations, not just “multi-region” claims.
What is your automatic failover time when a primary site goes down?
Target: under 30 seconds with zero dropped calls. Ask whether failover is transparent to the customer (no re-dial required) or requires call reconnection.
What is your Recovery Time Objective (RTO) and Recovery Point Objective (RPO)?
RBI mandates near-zero RPO for critical systems. Target RTO under one hour for complete site failure. Ask for documented RTO/RPO from the last DR drill, not just policy targets.
When was your last DR drill, and what were the results?
RBI requires half-yearly DR drills for critical systems. Ask for the drill report: actual RTO achieved, data loss measured, calls dropped during switchover.
Do you operate as a licensed telecom operator or rely on third-party carriers?
A Unified License Virtual Network Operator (UL-VNO) like Exotel operates across 11 telecom circles with outbound infrastructure pan-India, with direct carrier-grade control over voice routing. Platforms that rely entirely on third-party SIP trunking introduce an additional failure point outside their control.
How do you handle carrier-level failures (not just data centre failures)?
Multi-carrier diversity means the platform routes calls through alternative carriers when one fails. Ask how many carriers the platform uses and whether failover is automatic.
What happens to in-progress calls during a failover event?
Some platforms drop active calls during failover and require the customer to call back. Others maintain session state and resume the conversation on the backup site. Ask which approach your vendor uses.
Do you provide customers with a real-time status page showing infrastructure health?
A status page with historical uptime data and incident post-mortems demonstrates operational maturity. Ask for the URL and review the last six months of incidents.

Section 2 maximum score: 24 points. Target: 18+

Section 3: Campaign A/B testing framework (questions 17–25)

Outbound credit card and loan campaigns live or die on script performance. These questions determine whether the platform lets you test, measure, and iterate on scripts systematically.

Does your platform support simultaneous A/B testing of voicebot scripts?
At minimum, the platform should route a defined percentage of calls to each script variant and track outcomes separately. Ask whether you control the split ratio and whether it adjusts dynamically.
What metrics does the A/B testing framework track per variant?
The minimum set: conversion rate, drop rate/abandonment rate, average talk time, first contact resolution, sentiment score, and escalation rate. Ask whether these are available in real-time or only post-campaign.
Can you A/B test tone and conversation flow independently of content?
Testing “what you say” (content) separately from “how you say it” (tone, pacing, formality) produces more actionable insights than testing entire script variants. Ask whether the platform supports this granularity.
What is the minimum statistically significant sample size the platform recommends per variant?
A two-week test duration with at least 1,000 calls per variant produces reliable results. Ask how the platform determines statistical significance and whether it flags tests that have not reached significance.
Can you run A/B tests across different customer segments simultaneously?
Testing the same two scripts across premium, retail, and small-business segments reveals whether script performance varies by audience. Ask how segmentation works within the A/B framework.
How does the platform handle script variants across languages?
A Hindi script variant that outperforms in North India may underperform in Hinglish-speaking metros. Ask whether the platform tracks A/B results by language independently.
Does the platform support multivariate testing (more than two variants)?
For credit card pitch optimisation, you may want to test four opening lines, three objection-handling approaches, and two closing CTAs simultaneously. Ask whether the platform supports multivariate designs beyond simple A/B.
How quickly can you deploy a new script variant to a live campaign?
If deploying a new variant takes 48 hours of engineering work, your testing velocity drops. Ask whether operations teams deploy variants through a visual interface or whether it requires code changes.
Can you automatically promote the winning variant and retire the loser?
Platforms that automatically shift traffic to the higher-performing variant after reaching statistical significance reduce manual intervention and accelerate optimisation cycles.

Section 3 maximum score: 27 points. Target: 20+

Section 4: Real-time monitoring and alerting (questions 26–33)

During a 100,000-call campaign, problems surface in minutes, not hours. These questions determine whether the platform gives you visibility before a problem becomes a crisis.

Does your platform provide a real-time campaign dashboard with sub-minute refresh?
Dashboards that update every 15 minutes are post-mortems, not monitoring tools. Ask for sub-minute or real-time refresh on: active calls, connect rate, conversion rate, drop rate, average talk time, and queue depth.
Does the platform detect anomalies automatically, or do you rely on manual threshold alerts?
Adaptive anomaly detection learns historical patterns and flags deviations. Static threshold alerts (e.g., “alert when drop rate exceeds 5%”) miss gradual degradation. Ask which approach the platform uses.
What latency metrics are visible in real-time during a campaign?
P50, P95, and P99 latency should be visible per campaign, per number, and per region. Ask whether latency monitoring includes the full round trip through core banking integrations or only the voice processing layer.
Does the platform monitor speech recognition accuracy in real-time?
Target: above 95% for English, above 90% for Hindi and regional languages. If accuracy drops during a campaign (network quality, background noise, accent variation), the platform should alert before conversion rates drop.
Can supervisors listen to live calls and intervene during a campaign?
Real-time supervisor monitoring with barge-in capability is standard for human agents. Ask whether the same capability exists for voicebot calls.
Does the platform provide real-time sentiment analysis during calls?
Sentiment-triggered escalation (automatically routing a frustrated customer to a human agent) reduces complaint rates. Ask whether sentiment analysis runs in real-time or post-call.
How does the platform alert you to compliance exceptions during a campaign?
Missed consent disclosures, DND violations, or calls outside permitted hours should trigger immediate alerts, not appear in a weekly compliance report.
Can you set up custom alerts based on campaign-specific KPIs?
Beyond standard metrics, you may need alerts for: same customer called twice in one day, conversion rate dropping below target by region, or specific script variant underperforming.

Section 4 maximum score: 24 points. Target: 18+

Section 5: SLA and escalation accountability (questions 34–40)

An SLA without penalty clauses is a marketing document. These questions separate genuine commitments from vendor positioning.

What is your published uptime SLA, and what financial penalties apply for breaches?
99.99% uptime (4 minutes 19 seconds of downtime per month) is the minimum for BFSI. Ask for the service credit structure: what percentage of monthly fees do you recover at 99.9%, 99.0%, and 95.0% uptime?
How do you define “downtime” in your SLA?
Some vendors exclude scheduled maintenance, customer-caused issues, and third-party carrier failures from their SLA calculation. These exclusions can reduce a 99.99% SLA to an effective 99.5%. Ask for the full list of exclusions.
What is your P1 (critical incident) response time commitment?
Target: 15 minutes or less for critical incidents affecting live campaigns. Ask for the escalation matrix: who gets notified at 15 minutes, 30 minutes, and one hour?
Do you provide a dedicated account team for national bank deployments?
A shared support queue is not appropriate for a bank running 50,000 concurrent calls. Ask whether you get a named Technical Account Manager with direct escalation authority.
What is your change management process for platform updates?
Ask how much advance notice you receive for platform changes, whether you can opt out of specific updates, and whether changes are tested in a staging environment that mirrors your production configuration.
Do you provide post-incident reports (PIRs) for every service disruption?
A vendor that publishes PIRs with root cause analysis, timeline, and preventive actions demonstrates operational maturity. Ask for three recent PIRs.
Is your SLA backed by an independent third-party audit?
SOC 2 Type II reports validate that the vendor’s controls operate effectively over time. Ask for the most recent report (within six months). ISO 27001 certification validates information security management.

Section 5 maximum score: 21 points. Target: 15+

Section 6: Regulatory audit logging (questions 41–45)

RBI and TRAI compliance is not optional. These questions verify that the platform produces audit-ready records, not just call logs.

Does the platform maintain a complete DLT audit trail for every outbound call?
Every call on 140-series and 1600-series numbers must be logged on the DLT (Distributed Ledger Technology) platform with: timestamp, caller ID, recipient number, DND verification status, consent record, and call outcome. Ask whether this logging is automatic or requires your team to configure it.
How long does the platform retain call recordings, and where are they stored?
RBI mandates a minimum two-year retention for complaint-related calls. Ask whether recordings are stored in India (data residency), whether storage is encrypted at rest, and who has access (with access logging).
Does the platform capture and store consent records per DPDP Act requirements?
The Digital Personal Data Protection Act requires explicit, informed consent for each processing activity. Ask whether the platform records voice consent inline during the call, stores it against the DLT ledger, and supports real-time revocation.
Can the platform generate audit-ready compliance reports for RBI inspections?
During an RBI inspection, your team needs to produce call records, consent documentation, recording access logs, and DND compliance reports within hours, not weeks. Ask whether the platform exports these reports in a format your compliance team specifies.
Does the platform support DTMF-based data capture for sensitive information (PAN, Aadhaar)?
When customers enter PAN or Aadhaar numbers during a call, DTMF (keypad) capture at the network layer is more secure than speech-to-text capture at the AI layer. As a licensed telecom operator, Exotel processes DTMF tones at the network infrastructure layer, keeping sensitive digits out of the AI processing pipeline entirely.

Section 6 maximum score: 15 points. Target: 12+

Scoring rubric summary

Section	Questions	Max score	Target score	Weight
Concurrency architecture	1–8	24	18+ (75%)	High
Geographic failover and DR	9–16	24	18+ (75%)	High
Campaign A/B testing	17–25	27	20+ (74%)	Medium
Real-time monitoring	26–33	24	18+ (75%)	High
SLA and escalation	34–40	21	15+ (71%)	Medium
Regulatory audit logging	41–45	15	12+ (80%)	Critical
Total	1–45	135	90+ (67%)

Interpreting the total score:

110–135 (81–100%): Production-ready for national bank deployment. Proceed to commercial negotiation.
90–109 (67–80%): Viable with conditions. Identify gaps and negotiate remediation timelines into the contract.
60–89 (44–66%): Material gaps in reliability or compliance. Requires significant platform development before deployment.
Below 60 (below 44%): Disqualifier. The platform is not ready for BFSI-scale operations.

Sources:

Bajaj Finance AI-powered call center data (Q3 2025 disbursement reports)
RBI Master Direction on IT Governance, Risk, Controls, and Assurance Practices (effective April 2024)
TRAI TCCCPR 2018 regulations and DLT platform requirements
DPDP Act 2023 compliance framework
Industry concurrent call capacity benchmarks (2025–2026)
SOC 2 Type II audit standards for voice AI platforms

About Exotel

Exotel is a customer conversation platform that believes in the power of exceptional customer experience. As the invisible backbone of communication for some of the most loved brands, Exotel enables 25 billion+ conversations a year for 7,000+ businesses across voice, chat, bots, and contact centers.

Exotel’s AI-powered communication tools integrate across all channels, enabling personalized interactions and enhancing customer experience through automated workflows, real-time agent guidance, and self-service options. As a licensed Unified License Virtual Network Operator operating across 11 telecom circles pan-India, Exotel’s carrier-grade infrastructure processes DTMF tones at the network layer and delivers sub-800ms voice AI response times through the proprietary AgentStream architecture.

Evaluate Exotel for your next voicebot deployment

Shiva Tripathi

Shiva is Head of Digital Marketing & Developer Network at Exotel, a growing community of builders working with voice, messaging, and AI-powered communication APIs. He has spent 13+ years helping B2B SaaS companies grow through data-driven marketing, and today he's equally focused on helping developers discover, adopt, and get more out of Exotel's platform. He writes about developer ecosystems, voice AI trends, and what it takes to build great CX infrastructure.

BFSI Voicebot Reliability Checklist

Table of Contents

Transform CX with AI Solutions

How to use this checklist

Section 1: Concurrency architecture (questions 1–8)

Section 2: Geographic failover and disaster recovery (questions 9–16)

Section 3: Campaign A/B testing framework (questions 17–25)

Section 4: Real-time monitoring and alerting (questions 26–33)

Section 5: SLA and escalation accountability (questions 34–40)

Section 6: Regulatory audit logging (questions 41–45)

Scoring rubric summary

Interpreting the total score:

Sources:

About Exotel

Evaluate Exotel for your next voicebot deployment

Shiva Tripathi

Top 5 AI Contact Centers Automating Debt Collection

RBI-Compliant AI Call Flow for Debt Collections: Complete Guide

Related Articles

BFSI Voicebot Reliability Checklist

Table of Contents

Transform CX with AI Solutions

Found this interesting? Share it now!

Join Our Community

How to use this checklist

Section 1: Concurrency architecture (questions 1–8)

Section 2: Geographic failover and disaster recovery (questions 9–16)

Section 3: Campaign A/B testing framework (questions 17–25)

Section 4: Real-time monitoring and alerting (questions 26–33)

Section 5: SLA and escalation accountability (questions 34–40)

Section 6: Regulatory audit logging (questions 41–45)

Scoring rubric summary

Interpreting the total score:

Sources:

About Exotel

Evaluate Exotel for your next voicebot deployment

Shiva Tripathi

Top 5 AI Contact Centers Automating Debt Collection

RBI-Compliant AI Call Flow for Debt Collections: Complete Guide

Related Articles

Exotel Harmony Platform’s CCDP: The Real-Time Context and Intelligence Layer for Modern Contact Centers

The Art of the Handoff: How to Transfer Conversations Between Bots and Humans Without Losing a Beat

CIO CHOICE Recognition 2025: A Milestone Celebrating Trust, Growth, and Innovation