What are the things to know about Azure Virtual Machine Autoscaling

Written by

Under the topic

Dated:

February 13, 2022

Table Of Content

Azure Autoscaling is a critical feature for efficiently managing cloud resources based on demand, ensuring applications remain responsive and cost-effective.

By automatically adjusting the number of resources (such as Virtual Machines, App Service instances, or containers) based on defined performance metrics, autoscaling allows you to meet changing traffic loads and usage patterns without manual intervention.

Here are the key things to know about Azure Autoscaling.

Types of Autoscaling in Azure

Azure provides multiple autoscaling solutions depending on the service you're using:

Virtual Machines (VMs)

Autoscaling for VMs can be achieved through Azure Virtual Machine Scale Sets (VMSS), where you can scale the number of VMs up or down based on resource demand, such as CPU usage, memory, or custom metrics.

App Service

Autoscaling for Azure App Services (Web Apps, API Apps) is based on performance metrics like CPU usage, memory, and HTTP request queues.

Autoscaling allows you to increase or decrease the number of instances automatically.

Azure Kubernetes Service (AKS)

Autoscaling can be applied at the pod level (with Horizontal Pod Autoscaler) and the node level (with Cluster Autoscaler) to adjust the number of pods and nodes dynamically based on resource utilization in Kubernetes clusters.

Azure Functions

Azure Functions can scale automatically based on the number of incoming events or requests, leveraging the Consumption Plan or Premium Plan, which scales the number of function instances as needed.

Virtual Machine Scale Sets (VMSS)

This is a comprehensive way to deploy and autoscale identical VMs across multiple availability zones.

VMSS offers automatic scaling and load balancing of VMs.

Key Metrics for Autoscaling

To make autoscaling effective, Azure uses performance metrics to determine when to scale up or down.

You need to choose the right metric based on your workload.

CPU Utilization

Commonly used for scaling VMs, web apps, or containers.

If CPU usage exceeds a predefined threshold, more instances can be added to distribute the load.

Memory Usage

If your app is memory-intensive, scaling based on memory usage can help maintain performance.

Request Count

For web applications, scaling based on request count (e.g., HTTP requests per second) ensures the system can handle increasing traffic.

Response Time

If the response time of your service exceeds a threshold (e.g., more than 1 second), you may want to scale out to improve performance.

Custom Metrics

Azure also supports custom metrics, such as queue length, disk I/O, or application-specific performance metrics from Azure Monitor or Application Insights.

Health Metrics

Health probes are essential for ensuring that autoscaling happens only when VMs are healthy.

If a VM fails the health probe, autoscaling can replace it with a healthy one.

Horizontal vs. Vertical Autoscaling

Horizontal Autoscaling (Scaling Out/In)

Definition

This involves adding or removing instances (VMs, containers, or app instances) based on the load.

This is ideal for stateless applications where each instance is independent and doesn’t rely on shared state.

When to Use

This is typically used for distributed or stateless applications that can run across multiple instances or nodes (e.g., web servers, microservices).

Example:

Scaling out web servers in response to increased traffic.

Vertical Autoscaling (Scaling Up/Down)

Definition

Vertical scaling involves increasing the size of an existing instance (more CPU, RAM, etc.), rather than adding more instances.

When to Use

Vertical scaling is more suitable for applications with high resource requirements but lower scalability across instances (e.g., databases, legacy applications).

Example:

Scaling up a database server to increase its CPU or RAM when it reaches certain thresholds.

Azure Autoscaling primarily focuses on horizontal scaling, though vertical scaling is supported in certain services (e.g., Azure VMs, App Service plans).

Autoscaling Triggers and Thresholds

Autoscaling in Azure works based on certain triggers or conditions.

These triggers determine when to scale in (remove resources) or scale out (add resources).

Scaling In

When resource utilization is low (e.g., CPU < 30% for a defined period), autoscaling will remove instances to optimize cost.

Scaling Out

When resource utilization exceeds a certain threshold (e.g., CPU > 80%), autoscaling will add more resources to handle the load.

Cool Down Periods

To avoid rapid and unnecessary scaling actions, a cool down period can be set.

This ensures that the system doesn’t react to temporary spikes or dips in resource usage.

Min/Max Instances

When setting up autoscaling, it’s important to define a minimum and maximum number of instances to avoid scaling beyond desired capacity or under-provisioning.

Scaling Policies and Schedules

Scaling Policies

These define the conditions under which autoscaling occurs.

For example:

Scale up when CPU > 75% for 10 minutes.
Scale down when CPU < 50% for 15 minutes.
Custom metrics like queue length or response time can also trigger scaling.

Scheduled Autoscaling

For applications with predictable load patterns, such as e-commerce sites during holidays, you can set up scheduled autoscaling.

This allows you to scale out before peak hours and scale in during off-peak times, optimizing costs.

Advanced Scaling Policies

Azure allows more advanced scaling configurations where you can create custom scaling logic, using Azure Monitor Alerts, Application Insights, or Azure Logic Apps to trigger scaling based on complex conditions.

Autoscaling and Cost Management

Autoscaling is a key tool for cost optimization, but it’s important to balance performance needs with resource consumption.

Cost Control

Autoscaling ensures that you only use the resources needed at any given time, avoiding over-provisioning and under-provisioning.

However, poorly configured scaling policies can lead to excessive scaling or missed opportunities for cost savings.

Min/Max Limits

Set minimum and maximum instance limits to ensure that scaling does not result in runaway costs.

Spot VMs

For non-critical workloads, Azure Spot VMs can be part of your autoscaling setup, which are typically cheaper than regular VMs, but can be evicted when Azure needs capacity.

This is a cost-effective way to scale out during peak demand without incurring significant costs.

High Availability and Fault Tolerance

Azure autoscaling can improve the availability and fault tolerance of your application by distributing resources across multiple Availability Zones or Availability Sets.

Availability Zones

Deploying resources across multiple availability zones within a region provides redundancy and ensures high availability.

Autoscaling ensures that if one zone becomes unavailable, traffic is directed to healthy instances in other zones.

Fault Domains

Azure provides fault domains within availability sets.

Autoscaling can ensure that instances are spread across different fault domains, minimizing the risk of failure due to hardware issues.

Health Monitoring

Use health probes to ensure that only healthy instances are in the autoscaling pool.

This helps maintain application availability by replacing unhealthy instances with new ones.

Autoscaling for Containers and AKS

For containerized workloads, autoscaling is managed at both the pod and node levels.

Horizontal Pod Autoscaler (HPA)

In Azure Kubernetes Service (AKS), HPA scales the number of pods based on metrics like CPU utilization or custom metrics from Azure Monitor.

Cluster Autoscaler

The Cluster Autoscaler in AKS automatically adjusts the number of nodes (VMs) in the cluster based on the number of running pods and resource requirements.

If there aren’t enough resources to run the new pods, the Cluster Autoscaler adds more nodes.

Azure Container Instances (ACI)

For serverless containers, Azure Container Instances can scale automatically to meet demand, and the number of containers can increase or decrease dynamically.

Monitoring and Alerts for Autoscaling

To ensure autoscaling is functioning as expected, Azure Monitor and Application Insights can help:

Azure Monitor

Use Azure Monitor to track the performance of your autoscaling policies and adjust metrics or thresholds based on historical data.

Application Insights

For deeper insights, especially for web applications, Application Insights helps track application performance, including latency, request counts, and failure rates.

Alerts

Set up alerts for scaling events.

For instance, if autoscaling happens more frequently than expected, you can investigate and adjust the scaling policies accordingly.

Best Practices for Autoscaling

Avoid Rapid Scaling

Configure a cool down period to prevent autoscaling from overreacting to short-term traffic spikes or dips.

Test Autoscaling Configurations

Before going live, test autoscaling policies to ensure they meet the needs of your workload, especially during peak demand.

Balance Between Scaling In/Out

Be cautious when scaling in, as removing too many resources can result in degraded performance.

Similarly, excessive scaling out can lead to unnecessary costs.

Use Load Balancers

Integrate autoscaling with Azure Load Balancer or Application Gateway to ensure that traffic is evenly distributed across all instances in the scale set or container cluster.

Understand Limits and Quotas

Ensure you are aware of any service limits, quotas, or restrictions on scaling (e.g., maximum number of VMs in a scale set, number of app service instances).

Summary

Azure Autoscaling is a flexible and powerful feature that helps optimize resource usage, improve performance, and reduce costs.

To maximize its effectiveness, it’s important to carefully configure scaling policies, choose the right metrics, and monitor scaling events.

By considering factors such as workload type, scaling triggers, and cost management, you can ensure that your application is always appropriately scaled to meet demand while avoiding unnecessary overhead.

Hands-on – Creating a Web App in the Azur…

Creating a Web App in the Azure Portal is a straightfor…

Hands-on – Create and deploy Azure App Se…

Creating and deploying an App Service Plan in Azure inv…

Hands-on – Learn how to Backup and Restor…

Backing up and restoring an Azure App Service is an ess…

Learn about the things to consider when using V…

Autoscaling is a powerful feature in Azure that automat…

Tags attached to this Post

About the Author

Rajnish Kumar Jha

MCT, MCSA, MCSE, MCAD, MCPD, MCTS, MCSD

My name is Rajnish Kumar Jha. I am Technical architect on Azure Cloud and .NET since 21+ years. I’ve worked for pioneer companies and as freelance trainer/consultant helping my clients to achieve their IT goals.

I find blogging, a great way to share back what I’ve learned all through my professional journey. You are welcome to connect or share feedback/suggestion here or through an email.