“Competitor-obsessed startups rarely win. Customer-obsessed startups always win.”
Exotel’s Voice Platform handles millions of calls per day, among which 35% of total calls are recorded and uploaded to the cloud for long-term storage and customer access. In the current age of data, call recordings are a vital source of information for businesses to draw insights from and improve customer experience.
Until now, Exotel’s recording pipeline consisted of multiple micro-services, closely interacting with each other to upload the recordings to their eventual destination. The recordings had to traverse through multiple services, where each hop added some latency for the recording upload. As a customer-obsessed startup, and a team that always strives for efficiency, reliability, and scalability, it is very important for us to keep our services lean and pragmatic. Hence, we took on the challenge of revamping our recording pipeline to make it more efficient. Feel free to skip ahead to any of the following sections:
- A look at the old recording pipeline & the challenges it presented
- Introduction to Firefoot – The new Recording Pipeline
- Challenges in the rollout:
- The Outcome
A look at the old recording pipeline & the challenges it presented
Once a call is initiated and bridged, Cloud PBX dumps the audio-recording in Raw-WAV format to a recordings directory on the telephone servers. After the call ends, the call handler service performs few audio-processing steps like Mix, Merge and Encode. Once the audio processing is done by the call handler, it stores the audio file under a recordings directory.
Recogenix-TS Service listens on the File system notification for a new audio file and uploads it to a temporary cloud store. After uploading, it would enqueue a message to Recogenix-cloud via another service called Homeopathix. Once Recogenix-cloud receives the queue message, it moves the recording from temp-bucket to the exotel-final-bucket. After this, recogenix-cloud enqueues a job to the Recotrix which updates the recording-URL in the data-store for the call-details information.
Challenges in the above pipeline:
- There are almost 4 micro-services closely working together to accomplish a single task.
- The processing of the audio file was handled by the call-handler service which is responsible for handling the call execution. During this processing of the audio file, one thread would be blocked per call (where recording is enabled), which means that this processing directly hinders the scalability of the numbers of calls that the call-handler service can handle.
- This pipeline is static in terms of audio properties. Settings like audio-codec, bitrate, sampling rate are not configurable or extensible in this pipeline. Which makes it difficult to implement a mechanism to ship different types of recordings like wav, mp3(64kbps), etc.
- This pipeline requires an extra service(Recotrix) to update the data store with the correct recording URL.
- There are two mechanisms to get a file shipped. One is uploading by Recogenix-ts, and the fallback path is FTP-pull by Recotrix. Which is extra overhead from a maintenance and monitoring perspective.
- There is an extra hop of recording at a temporary directory on the cloud, which requires some extra cost for maintenance.
- And the extra overhead to maintain these services at both server and cloud level is a challenge as well.
Although the latency of recordings upload was not very high (especially considering the fact that it was a pipeline), the addition of new functionality would have been time-consuming because of the number of components involved. Besides, we had to monitor and maintain multiple services in the pipeline and scale them depending upon traffic which required additional operational effort. Lastly, the pipeline shipped all recordings to their final cloud storage destination via temporary cloud storage which could be optimized to reduce latency as well as cost.
Introduction to Firefoot – The new Recording Pipeline
Firefoot is a queue-worker service that can perform multiple tasks based on a queue-message trigger. Firefoot being the single service to handle the whole recording pipeline, customers will now receive the recording in the most efficient and the least possible time. This will result in decreased latency and improved reliability of the recording upload.
New Recording Pipeline Flow
Once a call is initiated and bridged, cloud PBX dumps the audio-recording in WAV format to a recordings directory. The call handler service reads the recording configurations like audio-codec, bitrate, samplingRate and more from the flow-executor service which makes firefoot highly extensible/configurable. After the call ends, the call handler service enqueues a recording-job to Firefoot with all the audio-configuration. Call handler service also persists the recording job in a data-store as a fallback path. Once Firefoot receives the job, it mixes, merges and encodes the recording, finally uploading it to “exotel-final-bucket”.
Improvements in the above pipeline:
- Single service to maintain the whole recordings pipeline instead of four micro-services.
- The audio processing of the recording is now being handled by Firefoot such that it improves the call-handlers ability to handle more calls.
- Firefoot accepts multiple audio-based properties by the call handler service on which it does the processing. This makes this pipeline more robust, extensible, and scalable.
- The fallback path is now tucked into Firefoot itself, where it reads from the local data-store to get hold of stale recording files and gets them uploaded.
- Firefoot directly uploads the job to the customer bucket, without any temporary hops, unlike the older pipeline.
- No extra overhead to maintain different services at different infrastructure levels. A single service residing inside the telephone server does the job now.
Challenges in the rollout:
For SaaS companies such as ourselves, a change, however big, has to be rolled out without disrupting our regular traffic. We strongly believe that whatever the benefit, no change should inconvenience our customers.
To roll out such big changes, we perform extensive testing and planning. These changes were pushed through a canary deployment strategy with close monitoring of the service metrics to detect any anomalies. During the rollout process, we ensured that none of the recordings got missed. Among other things, we also automate the deployment jobs so that we are able to rapidly deploy the changes to hundreds of telephone servers that we operate. Also, we take extreme care to fix any anomalies that are detected so that none of our customers are affected during the process.
The Outcome
Performance:
- Currently, Firefoot is uploading all the recordings to the customer bucket with a ~55% reduction in upload latency.
- With the rollout of Firefoot, the processing load on the call handler service has decreased. Earlier, one thread was occupied in processing the audio file which is taken care of by Firefoot now, enabling Exotel’s infrastructure to handle more calls.
Cost:
- By replacing four microservices with one, we were able to eliminate the extra expenditure of maintaining those cloud services and temporary storage.
Reliability:
- The new pipeline has been successful in achieving seven-sigma (99.9999981%) precision on shipping the recordings to the customers.
- Firefoot uploads the recordings using multiple internet lines connected to the server, to better utilize the bandwidth available for recordings upload.
- Improved the reliability of Exotel’s call infrastructure, by moving out the audio processing from call handler service to Firefoot.
Extensibility:
- Firefoot accepts multiple configurations related to a recording like Bitrate, Sampling Rate, Encoding and more from our flow-executor, which makes the recording pipeline more configurable and extensible.
- Firefoot enables the recording upload of LWB (Listen-Whisper-Barge) feature which was recently rolled out for our CCM-focused customers.
All of these improvements in our recording pipeline will enable our customers to use the recording data more efficiently and reliably. It is hard to quantitatively measure this metric but among all the above this was the major improvement we strived for.
This was a short intro into how we took up the challenge of simplifying the pipeline and making it more reliable, scalable, and efficient. Thus resulting in a win-win outcome for both Exotel and its customers. We take huge pride in solving such interesting problems and making services/pipelines more efficient and reliable.