SoftwareOne logo

Now You Need It, Now You Don’t – How To Scale Your Application

SoftwareOne blog editorial team
Blog Editorial Team
A woman's hand is holding a smartphone with a blue light on it.

The cloud has many advantages. As a developer, one of the biggest benefits I've found is how easily I can scale applications. Let's go over some ways we can do this on the Azure platform.

In Azure, many services have autoscaling out-of-the-box, which means they dynamically allocate resources to match performance requirements.
Your system will be automatically scaled for high throughput if needed. When there is less demand on the service, the system automatically scales down resources. This approach gives your system the ability to be responsive all the time, independent of the load.

Scale up or scale out?

Scaling up and scaling out are two different ways of scaling or increasing throughput in the system. Both have their advantages and disadvantages.

What is scaling up?

Scaling up is also referred to as vertical scaling, which means that the system will perform resource upgrades your system to reach the desired performance level (e.g. by adding more RAM or changing the CPU).
In Azure, when you want to scale up your application, you need to change your app service plan. It does not happen by default.

However, scaling up limits your system performance to a single server and becomes increasingly expensive the higher you have to scale.

What is scaling out?

Scaling out is also called horizontal scaling. It means adding or removing instances of a resource that we are running our system on. The application continues running without interruption as the platform provisions new resources. When the provisioning process is complete, it deploys the solution with these additional resources. 
Many Azure services support automatic scaling out. The platform also allows providing customrules for it. As the owner of the system, you can decide when and how it should be scaled. Horizontal scaling is more flexible in the cloud context as it allows you to run potentially thousands of instances to handle the throughput.


Autoscale ensures the right amount of resources to handle the load on your application. It will enable you to add resources to handle increases in load and also save money by removing resources that are sitting idle. You specify a minimum and maximum number of instances automatically based on a set of rules. Setting a minimum makes sure your application is always running even under no load; setting a maximum limits your total possible hourly cost. The platform automatically scales between these two definitions using rules you create.

In Azure, you can set up autoscale using two modes [Source]:

  1. The first mode is based on analyzing specified metrics from your application. It then compares them to scaling rules you define. For example, you can set up the service to be scaled up when it uses more than 70% of CPU.
  2. The second mode is fixed. You have to select a period wherein your system will be scaled, so that you can add a rule for it. For instance, you can add a rule that every Saturday between 10-12 am your system will scale to 10 instances.

Here is a list of services that support the autoscale functionality:

  • Virtual Machine Scale Sets
  • Cloud Services
  • App Service – Web Apps
  • Azure Functions
  • API Management
  • Azure Service Fabric / Azure Service Fabric Mesh
  • Azure Kubernetes Service

Application insight

How do you check how many instances the system uses? Easy! Use application insight. It has a user-friendly UI to present the state of our applications, where we can see the components that we are using, and how many running instances we have.
For example, I created 30,000 events and added them to the event hub. An Azure function was listening on this event hub and stored everything to Azure Table storage. As you can see, my simple application was scaled to 8 instances to handle the load efficiently.

Tips and tricks for working with autoscale

Here are a few things worth remembering when working with the autoscale functionality.

  1. Creating a horizontal scale-aware system can be tricky, because you have to keep in mind that your code will be run on many instances at the same time. The best practice is to write a stateless event-driven solution. (Event-driven is a pattern where any change is propagated to other components).

Scaling is ready to handle a huge volume of small events. So why should we go with stateless? Because the system can be largely multiplied in a short time, so many things won't work properly.
If you really want to use the alternative option, I suggest you start using Service Fabric, because it's one of the best orchestration services that handles stateful services.

  1. The system should be asynchronous. If component A calls component B synchronously, A and B are tightly coupled, and that coupled system has a single scalability characteristic: to scale A, you must also scale B.
  2. If you want to use a cache, think about using a distributed cache like Azure Redis.
  3. The best approach is to have many small services without any long-running tasks, because an instance of a process could be prevented from being shut down cleanly when the system scales in, or it could lose data if the process is forcibly terminated.

Of course, you also need to consider implementing a retry policy and a circuit-breaker.


A benefit of using the cloud to run a system vs. on-premise is an out-of-the-box feature called autoscale for basic services, such as Virtual Machine, App Services, Azure Functions, Kubernetes, Service Fabric.

You decide which conditions need to be met to scale your system and the size of server you need as a base of the scale.

Your application can be scaled out or scaled up. The first option reproduces the same environment times and runs on the same application. The second one upgrades hardware to add more power to it. Additionally, you can monitor everything using Application insight, which shows you a graphic of how and for how long your application was scaled. The best application to scale is astatelessshort-running task that implements an event-driven pattern and asynchronous communication between other applications.


SoftwareOne blog editorial team

Blog Editorial Team

We analyse the latest IT trends and industry-relevant innovations to keep you up-to-date with the latest technology.