“Competitor-obsessed startups rarely win. Customer-obsessed startups always win.”
Exotel’s Voice Platform handles millions of calls per day, among which 35% of total calls are recorded and uploaded to the cloud for long-term storage and customer access. In the current age of data, call recordings are a vital source of information for businesses to draw insights from and improve customer experience.
Until now, Exotel’s recording pipeline consisted of multiple micro-services, closely interacting with each other to upload the recordings to their eventual destination. The recordings had to traverse through multiple services, where each hop added some latency for the recording upload. As a customer-obsessed startup, and a team that always strives for efficiency, reliability, and scalability, it is very important for us to keep our services lean and pragmatic. Hence, we took on the challenge of revamping our recording pipeline to make it more efficient. Feel free to skip ahead to any of the following sections:
Once a call is initiated and bridged, Cloud PBX dumps the audio-recording in Raw-WAV format to a recordings directory on the telephone servers. After the call ends, the call handler service performs few audio-processing steps like Mix, Merge and Encode. Once the audio processing is done by the call handler, it stores the audio file under a recordings directory.
Recogenix-TS Service listens on the File system notification for a new audio file and uploads it to a temporary cloud store. After uploading, it would enqueue a message to Recogenix-cloud via another service called Homeopathix. Once Recogenix-cloud receives the queue message, it moves the recording from temp-bucket to the exotel-final-bucket. After this, recogenix-cloud enqueues a job to the Recotrix which updates the recording-URL in the data-store for the call-details information.
Challenges in the above pipeline:
Although the latency of recordings upload was not very high (especially considering the fact that it was a pipeline), the addition of new functionality would have been time-consuming because of the number of components involved. Besides, we had to monitor and maintain multiple services in the pipeline and scale them depending upon traffic which required additional operational effort. Lastly, the pipeline shipped all recordings to their final cloud storage destination via temporary cloud storage which could be optimized to reduce latency as well as cost.
Firefoot is a queue-worker service that can perform multiple tasks based on a queue-message trigger. Firefoot being the single service to handle the whole recording pipeline, customers will now receive the recording in the most efficient and the least possible time. This will result in decreased latency and improved reliability of the recording upload.
New Recording Pipeline Flow
Once a call is initiated and bridged, cloud PBX dumps the audio-recording in WAV format to a recordings directory. The call handler service reads the recording configurations like audio-codec, bitrate, samplingRate and more from the flow-executor service which makes firefoot highly extensible/configurable. After the call ends, the call handler service enqueues a recording-job to Firefoot with all the audio-configuration. Call handler service also persists the recording job in a data-store as a fallback path. Once Firefoot receives the job, it mixes, merges and encodes the recording, finally uploading it to “exotel-final-bucket”.
Improvements in the above pipeline:
For SaaS companies such as ourselves, a change, however big, has to be rolled out without disrupting our regular traffic. We strongly believe that whatever the benefit, no change should inconvenience our customers.
To roll out such big changes, we perform extensive testing and planning. These changes were pushed through a canary deployment strategy with close monitoring of the service metrics to detect any anomalies. During the rollout process, we ensured that none of the recordings got missed. Among other things, we also automate the deployment jobs so that we are able to rapidly deploy the changes to hundreds of telephone servers that we operate. Also, we take extreme care to fix any anomalies that are detected so that none of our customers are affected during the process.
Performance:
Cost:
Reliability:
Extensibility:
All of these improvements in our recording pipeline will enable our customers to use the recording data more efficiently and reliably. It is hard to quantitatively measure this metric but among all the above this was the major improvement we strived for.
This was a short intro into how we took up the challenge of simplifying the pipeline and making it more reliable, scalable, and efficient. Thus resulting in a win-win outcome for both Exotel and its customers. We take huge pride in solving such interesting problems and making services/pipelines more efficient and reliable.