Building COVID Vaccine Appointment System on AWS - To waiting room, or not!

Let me start by saying, waiting rooms are no fun. It can get lonely, create anxiety and you are really hoping for the entire experience to get done sooner. The same philosophy applies to online experience of the user as well. Today, I want to share my thoughts specifically on the context of what an ideal COVID-19 vaccine appointment system should be like.
I have seen a lot of architectures where the proposed design involves setting up a waiting room. While that’s great way to shield your application servers from the traffic bursts that you may see; in my view, it is difficult to have a 1-size-fits-all design.

The waiting room design is more suited for those Black Friday sales or buying those Rihanna concert tickets, in my personal opinion, where you are let in after a momentary wait. I see this architecture work for vaccine appointments only if, after a momentary delay (waiting room experience), the users are let in to make an appointment.

Unfortunately, that is not the reality today. We have a large number of users, vaccine shortages besides logistical issues that do not let the governments open the flood gates as yet. And, this is true for almost all the countries in the world today. And if this is the story, why make a large numbers of users wait apprehensively only to be told

that they didn’t make the cut. Besides the customer experience, you are also having to serve those elevated levels of traffic on your website and should have the infrastucture resilient enough to to be handle the surge. This also increases the cost to keep your services up to serve a large user-base. But, if I know that I only have 5000 available vaccine slots for this week; do I really need to serve 100,000 users on my website as an example and have 95% of them come back again next week. This can surely be avoided and there has be a cost effective way to build this out.
What I propose is a 2-step process from a user’s standpoint to get vaccinated:

- A user would go to the website and fill out the COVID-19 vaccine registration form. Once he has filled out the form, he would receive an email confirmation that his information has been recorded and the hospital or the county would reached out to him when his number is up in the virtual queue managed on the application side, which lets the user to unplug.
- Once the user’s turn is up, the user would be sent links to book his vaccination appointment to his pre-selected preferred mode of communication (email and/or sms).
With that context, lets see how would the architecture look like:

- You would want to accelerate your website with Amazon CloudFront which not only helps accelerate your website but more importantly, in this case helps offload traffic from the Origin. Do cache your static content at the Edge. Using Origin policies, you could also selectively cache certain dynamic content thereby, eliminating the need to keep scaling the origin horizontally during the traffic surge events. With a well-architected site coupled with optimal CloudFront settings, you can keep the traffic to the Origin down to when its absolutely needed.
- The API Gateway would be one of the origins in the CloudFront distribution. This would be the origin for the vaccine registration page. Depending upon the incoming requests and your backend capacity, you can apply rate-limiting on API Gateway and keep rest of the website functional.
- Amazon SQS integrates well with API Gateway, and you can choose to have the form data sitting in SQS Standard queue(s). The decision to go with Standard queue was that it has a higher throughput than FIFO queue, and I am willing to trade-off with a single user being occasionally placed a bit behind (best effort ordering) than where he/she should ideally be.
- AWS Lambda will batch the records from SQS in groups of 10 (or more) and write them to DynamoDB. The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB.
- I chose DynamoDB database because it is highly performant, with average latency in single-digit milliseconds. The data is also highly available as it is replicated across multiple AZs in a single AWS region. Lastly; being a fully-managed service, you do not have to worry about provisioning, set up, patching etc. You would store all the information from the vaccine registration form in this database and also incrementally store additional information, say vaccination scheduling and completion etc. Using TTLs, you can choose to drop certain fields once the vaccination is complete to keep the size of the database in check.
- AWS Step Functions is a serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services into business-critical applications. You can create and run a series of checkpoints and event-driven workflows that maintain the application state. In this architecture, it would first poll the appointment system on the customer side to get information on the number of available vaccine slots. Based on that number, it will query the DynamoDB database to find the next ‘x’ users who would be in turn to be vaccinated next.
- Using Amazon Pinpoint and Amazon SNS, emails and/or text messages are sent to these users, based on their contact preferences with a unique meeting scheduling link. You can choose to expire the scheduling link 48 hours after no-action by the user, in which case; the user will have to fill the registration form again and go through that process.
- Once the users take the appointment, the DynamoDB database is updated again.
