Collecting a PAN number over a voice call takes four seconds. Securing that collection against recording leaks, AI pipeline exposure, regulatory violations, and replay attacks requires an architecture that most voice AI vendors have not built.
The gap is specific: when a customer speaks their PAN aloud, the audio enters the speech-to-text (STT) pipeline, passes through intent classification, gets logged in conversation transcripts, and appears in call recordings. That is five exposure points for a 10-character alphanumeric string that links to the customer’s entire tax and financial identity. DTMF capture at the telecom network layer eliminates all five.
This guide walks through the full security architecture for voice-based PAN verification and KYC data collection, maps each component to RBI, DPDP Act, Aadhaar Act, and IT Act requirements, and provides a capability framework for evaluating vendors.
Why voice-based KYC is harder than it looks
Banks run KYC across two distinct journeys.
- Inbound: A customer calls to update their address, and the agent needs to verify identity before making changes.
- Outbound: The bank calls customers approaching re-KYC deadlines (every two years for high-risk, every eight years for medium-risk, every 10 years for low-risk per RBI Master Direction) to collect updated documents.
Both journeys require the same verification steps: confirm the customer’s identity, validate their PAN against NSDL records, and capture consent under the DPDP Act. The difference is operational context. Inbound calls have a motivated customer who initiated contact. Outbound calls reach customers who may be distracted, suspicious, or unaware of the re-KYC requirement.
A voicebot handling either journey needs three capabilities that most platforms treat as separate features rather than an integrated security architecture:
- Secure data capture: How sensitive numbers enter the system
- Real-time verification: How those numbers get validated against government databases
- Audit-ready logging: How every step gets recorded for RBI inspection
The five-layer security architecture
A production-grade voice KYC system operates across five layers. Each layer addresses a specific attack surface.
Layer 1: Network-layer DTMF capture
When the customer presses digits on their keypad, the DTMF signal travels through the telecom network. A licensed operator captures these tones at the network infrastructure layer before reaching the AI processing pipeline. The STT engine receives masked tones — not the original DTMF signals. Thus, the customer’s PAN digits never enter the speech recognition system, transcript, or call recording.
Layer 2: Encrypted API verification
The captured PAN travels over TLS 1.3 to the NSDL verification API, which returns the PAN holder’s name and validity status in under 200 milliseconds. For Aadhaar-based verification, an OTP is sent via the UIDAI API, and entered via DTMF.
Layer 3: Consent capture and recording
The DPDP Act requires explicit, informed consent before processing. The voicebot plays a consent disclosure, records the customer’s response, and timestamps the event against the DLT ledger.
Layer 4: Temporary storage with auto-purge
Section 29 of the Aadhaar Act prohibits permanent storage. The architecture uses encrypted temporary cache (e.g., Redis with TTL) for the verification transaction, then purges it. Only the verification result and a tokenised reference persist.
Layer 5: Encrypted audit log
Every event (DTMF timestamp, API call/response, consent, data purge) is written to an encrypted, append-only audit log. This log is RBI-auditable without exposing raw PAN or Aadhaar data.
Architecture flow: Customer call → DTMF capture (network layer) → encrypted API call (NSDL/UIDAI) → KYC database write (tokenised) → encrypted audit log
DTMF masking: why the capture layer matters
The security difference between DTMF capture and voice capture is not incremental. It is architectural.
When a customer speaks their PAN number, the audio passes through the voice AI pipeline, STT transcription, intent classification, entity extraction, and response generation. At each stage, the raw PAN string exists in memory, logs, and potentially in training data. Call recordings also retain the spoken PAN.
In contrast, DTMF keypad entry is intercepted at the telecom network layer. DTMF masking replaces the original tones with flat audio before the stream reaches the AI pipeline. The STT engine receives silence or a beep; the conversation transcript says “[masked input]”. The call recording contains no recoverable PAN data.
This is compliance-critical: PCI DSS scope reduction requires that sensitive data never enters systems that process voice or store recordings. AI platforms processing PAN via voice cannot achieve scope reduction, regardless of downstream encryption.
Platforms relying on third-party carriers cannot intercept DTMF at the network layer because they do not control the telecom infrastructure.
Real-time verification: NSDL and UIDAI integration
A voicebot that collects a PAN but verifies it hours later in batch introduces fraud risk. Real-time verification closes this window.
- PAN verification via NSDL: Real-time API returns the PAN holder’s name, last update date, and validity in under 200 milliseconds. Voicebot confirms name while on call, immediately flagging mismatches.
- Aadhaar verification via UIDAI: Voice-only Aadhaar verification is not approved. OTP-based collection via DTMF, then validation via UIDAI Authentication API 2.5. System receives and verifies OTP within five minutes.
- Voice biometric authentication: Proven at national banks, but best as an additional layer (alongside DTMF + OTP). Modern voice biometrics achieve 95–99% accuracy but are susceptible to deepfake voice attacks. Apply as multi-factor.
Compliance mapping: four regulatory frameworks
Voice-based KYC in Indian banking intersects four regulatory frameworks. This mapping connects platform controls to each requirement.
| Regulatory framework | Requirement | Platform control |
|---|---|---|
| RBI Master Direction on KYC (updated June 2025) | Customer consent recorded audibly and securely, in auditable and alteration-proof manner | Inline voice consent capture with DLT ledger timestamp |
| RBI Master Direction on KYC | Re-KYC cycles: 2 years (high-risk), 8 years (medium), 10 years (low) | Outbound campaign automation with re-KYC schedule triggers |
| RBI Master Direction on IT Governance (April 2024) | Board-approved BCP/DR with half-yearly DR drills | Geographic failover with documented RTO/RPO |
| DPDP Act 2023 | Explicit, informed consent before processing personal data | Consent disclosure playback with affirmative response recording |
| DPDP Act 2023 | Right to erasure; 48-hour advance notice before deletion | Auto-purge with customer notification workflow |
| DPDP Act 2023 | Full compliance deadline: 13 May 2027 | Consent manager service with audit trail |
| Aadhaar Act, Section 29 | No permanent storage of Aadhaar numbers outside Aadhaar Data Vault | Temporary cache (TTL) with tokenisation; raw Aadhaar purged post-verification |
| Aadhaar Act, Section 29 | Core biometric info cannot be shared for any reason | Voice biometric voiceprints stored separately from Aadhaar data; no biometric data sent to UIDAI |
| IT Act, Section 43A | Reasonable security practices for sensitive personal data | AES-256 encryption at rest, TLS 1.3 in transit, role-based access control |
| TRAI TCCCPR 2018 | Outbound calls on 140/160-series logged on the DLT platform | Automatic DLT logging with consent and call outcome |
In FY 2024–25, RBI imposed approximately ₹54.78 crore in penalties across 353 regulated entities for KYC/AML violations — the top violation category.
Reusable authentication modules: one build, two journeys
Technical debt builds up when inbound and outbound flows are separate. Reusable modules fix this.
Authentication orchestrator: A single service handles authentication sequences for any voice journey. It determines verification level and invokes the required modules.
- PAN verification module: Accepts DTMF input, applies masking, calls NSDL API, returns result. No code duplication.
- Aadhaar OTP module: Triggers OTP via UIDAI API, captures OTP via DTMF, validates, and auto-purges Aadhaar as required.
- Voice biometric module: Captures/enrolls voiceprint, compares live voice with stored print, returns confidence score and liveness result.
Example workflows:
- Inbound: Customer calls → IVR routing → authentication orchestrator → PAN module (DTMF) → NSDL verification → account access.
- Outbound:Auto dialer connects → customer answers → authentication orchestrator → voice biometric (quick identity confirm) → PAN module (if transaction requires) → Aadhaar OTP (if high-value) → re-KYC completion.
The orchestrator maintains session state in a distributed cache and logs every event. Adding new verification methods (face biometrics, digital signature, etc.) becomes modular and scalable.
Vendor capability framework
When evaluating voice AI platforms for KYC automation, seven capabilities determine compliance readiness:
| Capability | What to verify | Why it matters |
|---|---|---|
| DTMF network-layer capture | Does the vendor process DTMF at the telecom network layer, or does the AI pipeline handle it? | Network-layer capture prevents PAN/Aadhaar data from entering STT, transcripts, and recordings |
| Licensed telecom operator status | Does the vendor hold a UL-VNO or equivalent, or rely on third-party SIP trunking? | Only licensed operators control the network layer for DTMF masking |
| Real-time API integration | Sub-200ms NSDL/UIDAI API calls during live conversation, or batch verification? | Batch verification creates exposure windows for fraud |
| Section 29 compliance | Temporary storage with auto-purge, or persistent Aadhaar storage? | Permanent Aadhaar storage violates the Aadhaar Act |
| Consent capture inline | Voice consent recorded and timestamped during call, or separate consent workflow? | DPDP Act requires explicit consent before processing |
| Reusable module architecture | Same authentication modules for both journey types? | Separate flows create control drift and audit complexity |
| Audit log completeness | Every DTMF, API call, consent, and purge event logged? | Incomplete logs fail RBI inspection |
A platform that scores “yes” across all seven is architecturally ready for national-scale voice-based KYC. A platform relying on voice recognition for sensitive capture or third-party telecom infrastructure creates gaps no amount of downstream encryption can close.
Sources
- NSDL e-Gov PAN verification API documentation
- UIDAI Aadhaar Authentication API 2.5 (Revision 1, January 2022)
- RBI Master Direction on KYC (updated June 2025)
- RBI Master Direction on IT Governance, Risk, Controls, and Assurance Practices (April 2024)
- Digital Personal Data Protection Act 2023
- Aadhaar (Targeted Delivery of Financial and Other Subsidies, Benefits and Services) Act 2016, Section 29
- Information Technology Act 2000, Section 43A
- TRAI TCCCPR 2018 regulations
- RBI enforcement action data FY 2024–25
- ICICI Bank voice biometric deployment data
- Indian Bank mobile voice biometric integration
- PCI DSS v4.0 scope reduction requirements for DTMF masking




