Let’s assume you have built a web application of the services/products you are providing and have started getting some consumers.


After receiving some suggestions and feedback, you are ready to scale your application.

Later on, your marketing team starts promoting the app to gain new customers and within no time, thousands of people start using your app and at one point, they are unable to access your app.

You’ve checked your app and it is working just fine.

So what happened?

What went wrong?

Turns out, your app is not working because of the problem of scalability.

Your cloud architecture is not capable enough to handle the increasing number of visitors.

So many companies these days focus more on features and less on scalability.

Resilient and scalable are both an important part of any application architecture.

And in this post, you will discover how to build extremely scalable web application architecture that can easily scale and handle up to 1 million users.

What is a scalable application?

Scalability is the capacity of a system to give a moderate performance under increasing load (like large data, high request rates, velocity, etc).

It should work smoothly whether there is 1 user or 1 million users accessing the app.


Scalable apps capture only important resources to meet tasks by removing the sources which are not required.

When discussing scalability in cloud computing, you will usually hear about two main ways of scaling – horizontal or vertical.

Let’s dig more into these terms.

Vertical scaling (Scaling Up)

Vertical scaling is also known as Scaling up. It refers to the resource maximization of a single unit to increase its capacity to control the increasing load.


In hardware terms, this comprises additional processing power and memory to the physical machine operating the server.

In software terms, scaling up may possess rearranging algorithms and application code.

Horizontal scaling (Scaling Out)

Horizontal scaling, which is also known as Scaling out, refers to resource growth by adding units to the app’s cloud architecture.


In short, it means adding more small capacity units instead of adding a single large capacity unit.

The requests for resources are then expanded across several units thus dropping the overload on a single machine.

What characteristics determine if an application is scalable?

How should scalability look like? There are some parts where an app needs critical considerations.


For a company to maintain its reputation, website operating time is vital.

You can consider a large online seller, for example. If the site is not reachable for even a short period of time, millions of profits can be lost.

Constant availability will require a designer to think about removal for the key components, to present quick recovery of system failures and breaks.


A scaled application with bad performance (resulting in user disappointment) can impact SEO rankings as well. A speedy response along with fast recovery (low latency) is a must.

Reliability of Retrieval

When a consumer requests data, the same data should display, unless it has been updated of course. Users need to trust that when data is stored in the system, it will be there if they use it again.


The system has to be easy to control, sustain, and updated. Problems should be easy to spot. It should routinely operate without failures or breaks.


There is not only the cost of software and hardware. There is development cost, what it takes for the functioning of the system, and training that may be necessary. Total cost is what it takes to hold and operate the system.

It’s important to note that these principles must all be measured but there may be trade-offs.

For example, if you decide to resolve capacity issues by adding more servers, it will increase the costs and managing them will become trickier.

You also have lots of choices to pick the cloud provider while building the high-performance web application architecture.

The three leading cloud computing vendors, AWS, Google Cloud, and Microsoft Azure, each have their own strengths and weaknesses that make them perfect for different cases.

In this blog, we have chosen AWS to explain you how to build a web scalable application.

AWS is supplementary of the renowned company, Amazon; it provides different services that are cloud-centered for various needs.

AWS occupies the highest 33% market share of cloud computing. They offer outstanding documentation on each of their services, useful guides, white papers, and reference architectures for regular apps.

How to Build a Scalable Application that Supports 1 Million Users on AWS

Single user (first setup of cloud architecture)

You are the only one operating the app on the localhost. The initial progress can be very simple as installing an application in a box. Here, you need to utilize the following AWS services to get started.

Amazon Machine Images (AMI)

Amazon Machine Image (AMI) gives the data needed for instance to be launched, which is a virtual server in the cloud. You can identify an AMI during the launch of an instance.

An AMI has a template for the root volume for the instance, give approvals that control which AWS accounts can use the AMI to launch instances. Apart from that, a block device mapping specifies the volumes to be attached to the instance when it’s launched.

Amazon Elastic Compute Cloud (Amazon EC2)

Amazon Elastic Compute Cloud gives the scalable computing facility in the AWS cloud. This removes the hardware upfront so that you can develop and install applications easily.

Amazon Virtual Private Cloud (Amazon VPC)

Amazon Virtual Private Cloud gives assistance to launch AWS resources in a virtual network. It gives you full control over the virtual networking atmosphere including choosing of IP address range, subnet design, the arrangement of route tables, and network gateways.

Amazon Route 53

Amazon Route 53 is a highly accessible and scalable cloud DNS web service. Amazon Route 53 efficiently combines user requests to infrastructures running in AWS – such as Amazon S3 buckets, Amazon EC2 instances, or Elastic Load Balancing load balancers.

Here you need an advanced box. You can simply pick the larger instance type which is called vertical scaling. At the initial stage, vertical scaling is enough but we can’t scale vertically forever.

Eventually, you’ll hit the wall. Also, it doesn’t address failover and repetition.

USERS > 10 (Create multiple hosts and select the database)

First, you need to select the database as users started increasing and generating data. It’s wise to start with SQL Database initially because of the following reasons:

  • Recognized and well-worn technology.
  • Community support and trending tools.
  • It will not break SQL DBs in first 10 million consumers.

Note:you can pick the NoSQL database if your users are going to make a large volume of data in different forms.


At this phase, you have everything in a single bucket. This architecture is harder to scale and difficult to deal with in the long run. It’s time to launch the multi-tier architecture to split the database from the application.

USERS > 100 (Store database on Amazon RDS to ease the process)

When users are more than 100, Database deployment is the most important thing which needs to be done. There are two common directions to install a database on AWS.

The primary option is to use a guided database service such as Amazon Dynamo DB or Amazon Relational Database Service (Amazon RDS) and the second step is to host your own database software on Amazon EC2.

Amazon RDS

Amazon Relational Database Service (Amazon RDS) provides so much ease to arrange, manage, and scale the relevant database in the cloud. Amazon RDS has six common database engines to select from, like Amazon Aurora, Oracle, Microsoft SQL Server, MariaDB, PostgreSQL, and MySQL.

User > 1000 (Create many availability zones to improve availability)

As per current architecture, you may face accessibility issues. If the host for your web app neglects then it may go down. So you should have another web instance in another Availability Zone where you will place the slave database to RDS.

Elastic Load Balancer (ELB)

ELB expands the incoming application traffic across EC2 instances. It is horizontally scaled, requires no bandwidth limit, provides SSL termination, and executes health checks so that only healthy instances receive traffic.

This process has 2 instances behind the ELB. We can get 1000s of instances behind the ELB. This is Horizontal Scaling.

At this stage, you’ve multiple EC2 instances to provide service to thousands of users which eventually increases your cloud cost. To reduce the cost, you have to optimize instances’ usage based on changing load.

Users: 10,000s – 100,000 (Shift static content to object-based storage for good performance)

To enhance performance and efficiency, you have to add more read replicas to RDS. This will take the burden off the write master database. Besides, you can lessen the load from web servers by shifting static content to Amazon S3 and Amazon CloudFront.

Amazon S3

Amazon S3 is an object-based storage space. It is not attached to EC2 instance which makes it the best to store static content, like javascript, CSS, images, and videos. It is designed for 99.999% of stability and can store multiple petabytes of information.

Amazon CloudFront

Amazon CloudFront is also called Content Delivery Network (CDN). It recovers data from Amazon S3 bucket and spread it to many data center locations. It stores content at the boundary locations to provide consumers with the lowest latency rate.

Moreover, to lessen the load from database servers, you can use DynamoDB (managed NoSQL database) to save session state. You can also use Amazon ElastiCache for caching data from the database.

Amazon DynamoDB

Amazon DynamoDB is a quick and easy way to use NoSQL database service for applications that need reliable, single-digit millisecond latency. It is fully managed cloud database that supports document and key-value store models.

Amazon ElastiCache

Amazon ElastiCache is a Caching-as-a-Service. It eliminates the difficulty associated with arranging and managing a distributed cache surrounding.

It’s a self-curing infrastructure for instance; if nodes fail new nodes get started automatically.

Users > 500,000 (Setting up Auto Scaling to fulfill the changing demand automatically)

At this stage, your architecture is a bit complex to be handled by a small group of team and without proper auditing and analysis; it’s difficult to move ahead.

Now that the web tier is not that much heavyweight, it’s time for Auto Scaling!

Auto Scaling enables “just-in-time provisioning,” granting users to scale infrastructure vigorously as load demands.

It can launch or stop EC2 instances automatically based on Spikes in Traffic. You should only pay for the resources which are adequate to handle the load.

To check, you can use the following AWS services:

Amazon CloudWatch

AWS CloudWatch offers a good set of tools to observe the health and resource consumption of many AWS services. The data collected by CloudWatch can be used to set up alarms, send notifications, and prompt actions upon alarms firing. Amazon EC2 sends metrics to CloudWatch that illustrate your Auto Scaling instances.

The autoscaling group can comprise multiple AZs, up to as many as are in the same region. Instances can show up in many AZs not just for scalability, but for availability.

You need to add logging, monitoring, and metrics to enhance Auto Scaling perfectly.

Host-level metrics. Check the single CPU instance within an auto-scaling group and search out what’s going wrongly.

Collect level metrics. Examine the metrics on the Elastic Load Balancer to check the performance of the whole set of instances.

Log analysis. Examine if the application is telling you to use CloudWatch Logs. CloudTrail helps you analyze and control logs. If you have set up area-specific configurations in CloudWatch, it is not easy to mix metrics from different regions within an AWS monitoring tool.

ou can use Loggly, a log managing tool in that scenario. You can send logs and metrics from CloudWatch and CloudTrail to Loggly and combine these logs with other statistics for a better understanding of your applications and infrastructure.

Try to get as much performance as you can get from your configuration. Auto Scaling can assist with that. You don’t want systems that are at 20% CPU consumption.

The infrastructure is getting large; it can scale to 1000s of instances. We have examined replicas, we have horizontal scaling, but we need some computerization to help manage it all, we don’t want to manage each instance personally. Here some automation tools:

AWS Elastic Beanstalk

AWS Elastic Beanstalk is a service that allows users to install code written in Python, Java, Go, .NET, PHP, Node.js, Ruby and Docker on similar servers such as Apache, NGINX, Passenger, and IIS.

AWS OpsWorks

AWS OpsWorks provides a different approach to application management. Moreover, AWS OpsWorks auto-heals application load, giving scaling based on time or workload requirement and produces metrics to ease monitoring.

AWS Cloud Formation

AWS Cloud Formation offers resources using a template in JSON format. You have provided with the option to choose from a set of sample templates to get started on regular tasks.

AWS CodeDeploy

AWS Code Deploy is a platform, providing service for automating code deployment to Amazon EC2 instances and instances running on site.

Users > 1 million (Use Service Oriented Architecture (SOA) for good flexibility)

To serve more than 1 million users, you need to use Service Oriented Architecture (SOA) while making large scale web applications.

In SOA, we need to split each component from the respective tiers and create separate services. The individual services can then be scaled separately. Web and application tiers will have different resource needs and different services. This gives you a lot of flexibility for scaling and high accessibility.

AWS offers a host of generic services to help you build SOA infrastructure swiftly. They are:

Amazon Simple Queue Service (SQS)

It is an easy money-making service to decouple and coordinate the components of a cloud application. Using SQS sending, storing, and receiving messages can be performed easily between software components of any size.

Amazon Simple Notification Service (SNS)

You can send messages to a huge number of subscribers with SNS. Its profits are easy installation, smooth functioning, and high dependability to send notifications to everyone.

AWS Lambda

It is a compute service that allows you to run code without maintaining or managing servers. AWS Lambda processes your code only when it is required and scales automatically, from a small number of requests per day to thousands per second.
There is no charge when your code is not functioning. You only pay for the calculated time you consumed. You can also build serverless architecture composed of functions that are generated by events.


Learning how to build scalable websites consumes time, a lot of practice, and a good amount of money but by putting great efforts, you can even get popular.

Moreover, slow working pages leave a very negative impact on your users. It makes your users unhappy about your app, which ultimately leads to a bad reputation and will eventually be going to affect your profits.