(This article originally appeared as a part of a 3 part series here.)
At Exotel, we have a lot of customers running their support and sales call centers on us. When these agents receive calls, having a real-time notification, with contextual information about the details of the caller, on their browser, saves the customers and the agents a lot of time. It provides for a better customer experience since they do not have to go into the details of any on-going support queries or problems. To make this happen, we have a Push Notification Service that ensures real-time delivery of these notifications.
Generally, building a reliable push notification service (using the WebSocket protocol) comes with its own set of challenges. This article will talk about the design and implementation details to help overcome these.
We will look at the low-level design of the WebSocket server (that handles the push notifications). It leverages the gorilla/WebSocket library. We’ll rely immensely on two essential resources provided by the go language — the go-routines and the go-channels.
What are go-routines and go-channels, you ask? Well, the go language is known to have greatly eased the work of having a multithreaded architecture, both for the developer and also from the resource utilization angle (memory and CPU usage). It has provided two things to help this cause, the go-routines, which are the methods that can run concurrently with other methods, basically given a fancy name and a makeover. These can be thought of very lightweight threads (~2kb). And the go-channels, which are the communication wires between these go-routines.
So essentially, if you want to run something concurrently? Simply create a new go-routine (which is super cheap and easy).
Now, you want to communicate between these go-routines? Avoid sharing the memory or the variables between these routines, you’ll probably end up frustrated, trying to resolve issues that come with concurrency (race conditions etc.). Instead, pass this variable over something like the communication wire, to the intended go-routines, and this communication wire is go-channels (in-built in go), which passes all the tests of concurrency.
This is inline with the language’s philosophy that says “Don’t communicate by sharing the memory, share the memory by communicating”. The channels are like having a queue to communicate between two services, just that this one’s going to reside in your memory, and will help you to communicate between your threads.
Okay, got the terms, now how does the low-level design look like? Well, it would look something like this.
The go-routines and the channels — Here in this architecture, the different boxes denote the different types of worker pools that will run in our server.
The worker pools are basically like the different departments in a company. Like how the department employs an employee to get the job done, each worker pool may employ 1 or more workers (go-routines) to get the job done.
These company departments need to constantly communicate with each other to achieve a larger company objective. Likewise, the arrows between these workers denote the communication wires laid out for the workers to talk to each other (or the go-channels), which collectively achieve the objective of delivering the notification to our targets. The pool of workers will try to make some requests to some other pool to take care of the next part of the processing, and these requests will be sent over the channels.
Let us have a look at the different types of worker pools and their jobs.
1. Source (connection upgrade request) worker pool (HTTP server) – This worker pool processes the incoming connection upgrade requests. It will look at the targetID, the request fields, and the headers, and decide if the request is worthy of a WebSocket upgrade (based on the origin of the request, the authentication credentials, etc.). It will accordingly approve or reject the request.
Rejecting the request would simply mean passing back a non OK status as an HTTP response.
While approving the request would mean three things
- Sending an HTTP OK status in the response.
- Forwarding the connection object to the ‘Hub pool’ of workers, this is what we will call a registration request to the hub. More on this in the Hub pool section.
- Starting two go-routines (a read pump go-routine and a write-pump go-routine), for handling the reads and writes for this connection. More on this in the read pump / write pump worker pool section.
The ‘source (connection upgrade request) worker pool’ can have any number of go-routines running based on the incoming connection-upgrade-requests traffic.
For our case in point, the person Alexander sent a connection upgrade request tagged with the target ‘Alexander’ as soon as he opened our website on any of the devices. This worker then forwarded these connection registration requests to the hub pool. It would also start a read pump and a write pump, against each one of these connection requests.
2. Source (notification request) worker pool (HTTP / GRPC or Queue worker) – This worker pool sources the notification requests. These requests will be tagged with the targetID (the intended receiver). This pool will simply forward all the valid notification requests to the ‘Hub pool’ for further processing.
The ‘source (notification request) pool’ can have any number of go-routines based on the incoming notifications-traffic.
For our case in point, this worker received a notification ‘Hello, Alexander Hamilton!’ tagged against the target ‘Alexander’. This worker then forwarded the notification request to the hub pool.
3. The Hub pool – This worker pool, as you might have seen is the most critical component in the architecture. This stores all the current active connections (active connections meaning, a connection that has been upgraded to a websocket and has not been closed yet) in a map[targetID]connections. All the notifications are to be sent over this connection object. The hub takes care of three things.
- For a connection registration request, store the new connection in the map against the given targetID.
- For a connection deregistration request, remove the connection from the map. (Who sends this, you ask? Well, it may originate mostly from the read pump or the write pump. More on this in the relevant sections.) Additionally, this will also send a close-go-routine-request to the read and the write pump go-routines that run for this connection.
- For a notification request, it will search the map and get the active connections stored against that targetID, and forward the request to all the write pumps running for those connections. If it does not have any active connection against the targetID, it can simply discard the notification.
So, overall, if you see, this pool takes care of any and all read / write operations on the map.
Unfortunately, the Hub pool will only have a single worker go-routine., since the map in golang does not support concurrent operations inherently.
For the case in point, the connection registration requests would end up populating the connections in the map, as [Alexander]three-connections. Additionally, when the notification request comes to this hub, it would figure that there are three connections against the target ‘Alexander’ and would forward the request to all of these three connections’ write pumps.
4. The Read pump – This go-routine will be created one per connection immediately after the connection upgrade request is approved. This routine will continuously keep polling over the connection object, to check if there’s data sent over the connection by the target client (if you remember the earlier article, we’d seen that websockets enables a duplex communication). It will then forward the received notification to the appropriate processor.
Additionally, as soon as this pump gets some error while reading over the connection, this will send a deregistration request for this connection to the hub routine.
The read pump go-routine, as we’ve discussed, runs one per active connection.
5. The Write pump – The write pump will also be created one per connection, which handles the work of last-mile delivery of the notification through the connection. The notifications forwarded by the hub are, in this routine, written to the connection buffer by this pump. Here also, as we’d seen in the read-pump, as soon as it gets some error while pushing over the connection, it will send a deregistration request for this connection to the hub routine.
The write pump go-routine runs one per active connection.
For our case in point, the write pump against each of the three connections would receive the notification ‘Hello, Alexander Hamilton!’ and they would simply write this over the connection that they’re running for.
Now, we’re left with one last piece of the puzzle, ie. communication between these workers. For all of these requests that the workers have to make to each other, channels are extensively used. As in the diagram, the channels used are-
- Hub Connection Registration Request Channel
- Hub Connection Deregistration Request Channel
- Hub Write Notification Channel
- Write Pump Write Notification Channel (1 per connection)
- (optional) Read Pump Received Notification Channel
The WebSocket protocol allowed us to efficiently exploit the hardware resources. That, clubbed with the virtues of go language, helped us build a scalable and efficient push notification service, to provide for a seamless customer experience.