Skip to main content

Compute Services: Virtual Machines

 Compute Services: Virtual Machines, Auto Scaling, and Load Balancers

📅 Published: Feb 2026
⏱️ Estimated Reading Time: 18 minutes
🏷️ Tags: Cloud Computing, EC2, Virtual Machines, Auto Scaling, Load Balancing, AWS, Azure, GCP


Introduction: What Are Compute Services?

Compute services are the backbone of cloud computing. They provide the processing power to run your applications—the brains of your infrastructure. Just as a building needs a foundation before you can add walls and roofs, your cloud architecture needs compute before you can add databases, storage, and other services.

When you think about running an application in the cloud, you are fundamentally asking: where will this code execute? The answer is a compute service.

Cloud providers offer several types of compute services:

  • Virtual Machines — Traditional servers in the cloud

  • Containers — Lightweight, portable application packages

  • Serverless — Run code without managing servers

  • Platform as a Service — Deploy code, let the platform handle everything

This guide focuses on virtual machines, auto scaling, and load balancers—the foundational building blocks of cloud infrastructure.


Virtual Machines: The Foundation

What is a Virtual Machine?

A virtual machine (VM) is a software-based emulation of a physical computer. It runs an operating system and applications just like a physical server, but it shares physical hardware with other virtual machines.

Think of a physical server as an apartment building. Each virtual machine is a separate apartment. The apartments share the building's structure, electricity, and plumbing, but each has its own walls, doors, and locks. What happens in one apartment does not affect the others.

How Virtualization Works

A hypervisor is the software that creates and manages virtual machines. It sits between the physical hardware and the virtual machines, allocating CPU time, memory, storage, and network resources to each VM.

Each VM believes it has its own dedicated hardware. In reality, the hypervisor is sharing physical resources among many VMs.

Virtual Machine Naming Across Providers

ProviderService NameConsole Name
AWSAmazon EC2EC2 Instances
AzureAzure Virtual MachinesVMs
Google CloudCompute EngineVM Instances

Despite different names, the concepts are identical across providers.


EC2 Instance Types (AWS)

AWS organizes EC2 instances into families optimized for different workloads:

FamilyPurposeExample Types
General PurposeBalanced CPU, memory, networkt3, t4g, m5, m6i
Compute OptimizedCPU-intensive workloadsc5, c6i
Memory OptimizedMemory-intensive workloadsr5, r6i, x1e
Storage OptimizedHigh, sequential disk I/Oi3, d2
Accelerated ComputingGPU, FPGA workloadsp3, p4, g4dn

Selecting the right size:

Instance names follow a pattern: family generation size
Example: t3 medium

  • t = family (burstable general purpose)

  • 3 = generation

  • medium = size

Sizes scale from nano (smallest) to 48xlarge (largest).


Azure VM Series

Azure uses a similar family-based naming system:

SeriesPurpose
B-seriesBurstable, cost-effective for development
D-seriesGeneral purpose
E-seriesMemory optimized
F-seriesCompute optimized
G-seriesVery large, memory optimized
L-seriesStorage optimized
NC/ND/NV-seriesGPU workloads

Google Cloud Machine Types

GCP categorizes machines by family:

FamilyPurpose
N1, N2, N2DGeneral purpose
C2, C2DCompute optimized
M1, M2, M3Memory optimized
A2GPU workloads
E2Cost-optimized general purpose

Choosing the Right Virtual Machine

The right VM depends on your workload characteristics:

Web servers — General purpose families work well. The load is typically CPU and memory balanced.

Batch processing — Compute optimized families are appropriate. These workloads need CPU power but can tolerate lower memory ratios.

In-memory databases (Redis, Memcached) — Memory optimized families are essential. These services need large amounts of RAM relative to CPU.

Data warehouses (SQL Server, Oracle) — Storage optimized or memory optimized depending on whether I/O or memory is the bottleneck.

Machine learning training — GPU-accelerated families are necessary. Training models requires specialized hardware.

Development and testing — Burstable families like t3 or B-series are cost-effective for workloads that are idle most of the time.


Pricing Models

On-Demand — Pay by the hour or second with no long-term commitment. This is the most flexible but most expensive model for continuous workloads.

Reserved Instances — Commit to using a VM for 1 or 3 years in exchange for significant discounts (up to 70%). This is best for predictable, steady-state workloads.

Spot Instances (AWS) / Preemptible VMs (GCP) / Low Priority VMs (Azure) — Access spare capacity at deep discounts (up to 90%) but the cloud provider can reclaim the instance with short notice. This is best for fault-tolerant, stateless, or batch workloads that can be interrupted.

Savings Plans — Flexible pricing model that offers discounts in exchange for committing to a consistent amount of compute usage.


Auto Scaling: Handling Variable Demand

Why Auto Scaling Matters

In traditional data centers, you had to predict capacity years in advance. If you guessed too low, your application failed under load. If you guessed too high, you wasted money on idle servers.

Auto scaling solves this problem. It automatically adjusts the number of running instances based on demand.

The Three Components of Auto Scaling

1. Launch Configuration or Template
This defines what to run: the AMI or machine image, instance type, security groups, and user data scripts. It is the blueprint for each instance.

2. Scaling Policies
These define when to scale: add instances when CPU exceeds 70 percent for five minutes, remove instances when CPU drops below 30 percent for ten minutes. Scaling policies can be based on metrics, schedules, or even custom application signals.

3. Scaling Group
This defines where to run: the VPC, subnets, and minimum, maximum, and desired instance counts. The scaling group ensures the right number of instances are running at all times.

How Auto Scaling Works

When you configure auto scaling, you define:

  • Minimum capacity: The smallest number of instances to keep running. For production, this is often at least 2 for high availability.

  • Maximum capacity: The largest number of instances allowed. This prevents runaway scaling from bankrupting you.

  • Desired capacity: The target number of instances. Auto scaling works to maintain this number.

  • Scaling policies: Rules that adjust desired capacity up or down based on metrics.

When a scaling policy triggers, the auto scaling group launches new instances using the launch template. When instances are no longer needed, they are terminated. The process is fully automatic.

Scaling Policies

Simple Scaling — Respond to a single alarm by adding or removing a fixed number of instances. This is the simplest but least flexible approach.

Step Scaling — Different alarm thresholds trigger different scaling actions. For example, CPU at 50 percent adds 1 instance, CPU at 80 percent adds 5 instances.

Target Tracking — You specify a target metric value, and auto scaling maintains that target. For example, keep average CPU at 40 percent. This is the most automated and recommended approach.

Scheduled Scaling — Scale based on predictable patterns. For example, add instances at 8 AM Monday through Friday, remove them at 6 PM.

Predictive Scaling — Machine learning models predict future demand and scale proactively. This is the most advanced option.


Auto Scaling Across Providers

ProviderService NameKey Features
AWSAuto Scaling GroupsTarget tracking, predictive scaling, instance refresh
AzureVirtual Machine Scale SetsAutomatic OS upgrades, zone balancing
Google CloudManaged Instance GroupsStateful and stateless instances, rolling updates

Best Practices for Auto Scaling

Use target tracking scaling policies. They are simpler to configure and maintain than step or simple scaling.

Set appropriate cooldown periods. Cooldown periods prevent scaling actions from overlapping. Too short and you may scale unnecessarily. Too long and you may not respond quickly enough.

Test scaling behavior. Scale up manually during low traffic to ensure new instances join the load balancer correctly. Scale down to ensure connections drain properly.

Monitor scaling activities. CloudWatch logs or equivalent should alert you if scaling is happening too frequently, which indicates your thresholds are too sensitive.

Use lifecycle hooks. Run custom scripts when instances launch or terminate. This is useful for installing software, draining connections before termination, or sending notifications.


Load Balancers: Distributing Traffic

What is a Load Balancer?

A load balancer distributes incoming traffic across multiple targets—virtual machines, containers, or serverless functions. It acts as the front door to your application.

Think of a load balancer as a receptionist at a busy office. When visitors arrive, the receptionist directs them to an available employee. If one employee becomes overwhelmed, the receptionist sends the next visitor to someone else.

Why Load Balancers Are Essential

Load balancers provide three critical benefits:

1. High Availability
If one server fails, the load balancer stops sending traffic to it. Your application continues running even when individual components fail.

2. Scalability
As traffic grows, you add more servers behind the load balancer. The load balancer distributes traffic across all available servers. You can scale without changing how users access your application.

3. SSL Termination
Load balancers can handle SSL certificates, decrypting HTTPS traffic before sending it to your servers. This offloads CPU-intensive encryption work from your application servers.


Types of Load Balancers

Application Load Balancer (Layer 7)
Operates at the application layer. It understands HTTP and HTTPS traffic and can make routing decisions based on URL paths, hostnames, headers, and cookies.

Example: Send requests to /api/* to one group of servers and requests to /static/* to another group.

Network Load Balancer (Layer 4)
Operates at the transport layer. It routes traffic based on IP addresses and TCP/UDP ports. It does not inspect application content.

Network load balancers handle extreme performance requirements—millions of requests per second with ultra-low latency.

Gateway Load Balancer (Layer 3)
Operates at the network layer. It is used for deploying third-party virtual appliances like firewalls, intrusion detection systems, and deep packet inspection.


Load Balancer Components

Listeners
A listener checks for connection requests using a specific protocol and port. You configure listeners to handle incoming traffic. Common listeners are HTTP on port 80 and HTTPS on port 443.

Target Groups
A target group routes requests to registered targets. You register your virtual machines with a target group. Health checks are configured at the target group level.

Health Checks
Health checks determine whether a target is healthy and able to receive traffic. If a health check fails, the load balancer stops sending traffic to that target. Health checks run continuously.

Rules
Rules define how the load balancer routes requests. For application load balancers, you can route based on path, hostname, HTTP headers, or query parameters.


Load Balancer Algorithms

Round Robin
Requests are distributed evenly across healthy targets in rotation. This is the simplest and most common algorithm.

Least Outstanding Requests
Requests are sent to the target with the fewest active connections. This works well when requests vary significantly in duration.

Least Response Time
Requests are sent to the target with the fastest response time. This requires the load balancer to track performance metrics.


Cross-Provider Comparison

FeatureAWSAzureGoogle Cloud
Application Load BalancerApplication Load BalancerApplication GatewayHTTP(S) Load Balancing
Network Load BalancerNetwork Load BalancerLoad BalancerTCP/SSL Load Balancing
Internal Load BalancerInternal Load BalancerInternal Load BalancerInternal Load Balancing
Health ChecksYesYesYes
SSL TerminationYesYesYes
Web Application FirewallAWS WAFApplication Gateway WAFCloud Armor
Cross-zone balancingYesYesYes

Putting It All Together: A Complete Architecture

A production-ready web application combines all three services:

  1. Virtual Machines running your application code. These are the workers that actually serve user requests.

  2. Auto Scaling that adds or removes virtual machines based on CPU utilization or request count. During peak hours, more instances run. During quiet hours, fewer instances run.

  3. Load Balancer in front of the auto scaling group. Users connect to the load balancer, which distributes traffic across the healthy instances.

When traffic increases:

  • CPU utilization rises across instances

  • Auto scaling policy triggers

  • New instances launch

  • Instances register with the load balancer

  • Load balancer begins sending traffic to new instances

  • CPU utilization returns to target

When traffic decreases, the process reverses. Instances are removed, but only after they finish handling existing connections.


Real-World Scenarios

Scenario 1: E-commerce Website with Variable Traffic

An online store experiences high traffic during holiday seasons and flash sales, but steady traffic the rest of the year.

Solution:

  • Use general purpose virtual machines (t3 or m5 families)

  • Configure auto scaling with target tracking at 60 percent CPU

  • Set minimum capacity to 2 for availability, maximum to 20 for peak

  • Use application load balancer to distribute traffic

  • Configure health checks on the application endpoint

During normal operation, 2 to 3 instances run. During Black Friday, the group scales to 15 instances automatically. After the sale, it scales back down.


Scenario 2: API Service with Consistent Load

A financial services API processes a predictable number of transactions throughout the day. Reliability is critical.

Solution:

  • Use compute optimized virtual machines (c5 family)

  • Configure scheduled scaling to match business hours

  • Reserve capacity with 1-year reservations for cost savings

  • Use network load balancer for low latency

  • Deploy across multiple availability zones

The API runs with a consistent number of instances, avoiding the complexity of dynamic scaling while maintaining high availability.


Scenario 3: Batch Processing Workload

A data processing job runs every night, processing large files for several hours.

Solution:

  • Use storage optimized instances (i3 family) for high disk I/O

  • Configure spot instances for cost savings (workload is fault-tolerant)

  • Use auto scaling with step scaling based on queue depth

  • No load balancer needed (internal processing only)

The batch job uses spot instances at 90 percent discount. If an instance is reclaimed, the job simply continues on another instance.


Cost Optimization Strategies

Right-size your instances. Most organizations over-provision. Monitor CPU and memory utilization and scale down instances that are consistently underutilized.

Use reservations for steady workloads. If your auto scaling group maintains a minimum of 5 instances 24/7, reserve those 5 instances and use on-demand for the variable portion.

Consider burstable instances for development. T3 or B-series instances can burst to high CPU when needed but cost much less than standard instances.

Use spot instances for fault-tolerant workloads. Batch processing, containerized microservices, and development environments can run on spot instances at massive discounts.

Turn off instances outside business hours. For development environments, use scheduled scaling to reduce capacity to zero overnight and on weekends.


Summary

Compute services form the foundation of cloud architecture:

ServicePurposeKey Benefits
Virtual MachinesRun applicationsControl, flexibility, familiar model
Auto ScalingMatch capacity to demandCost efficiency, reliability
Load BalancersDistribute trafficHigh availability, scalability

When combined, these services create resilient, scalable applications that can handle any level of demand while minimizing cost during quiet periods.


Practice Questions

  1. An e-commerce website expects heavy traffic during holiday sales but moderate traffic otherwise. Which combination of services would you use?

  2. A financial application requires consistent performance and cannot tolerate any downtime. You need to protect against instance failure. How would you design the compute layer?

  3. A batch processing job runs for several hours each night, processing data from a queue. If a job is interrupted, it can restart from the last checkpoint. Which compute model is most cost-effective?

  4. A web application has two distinct components: a public API and an administrative dashboard. The public API gets 95 percent of the traffic. How would you configure the load balancer?

  5. Your development team needs 50 virtual machines for testing. They run during business hours but are idle at night and weekends. What is the most cost-effective approach?


Learn More

Practice compute services, auto scaling, and load balancers with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/

Comments

Popular posts from this blog

Introduction to Terraform – The Future of Infrastructure as Code

  Introduction to Terraform – The Future of Infrastructure as Code In today’s fast-paced DevOps world, managing infrastructure manually is outdated . This is where Terraform comes in—a powerful Infrastructure as Code (IaC) tool that allows you to define, provision, and manage cloud infrastructure efficiently . Whether you're working with AWS, Azure, Google Cloud, or on-premises servers , Terraform provides a declarative, automation-first approach to infrastructure deployment. Shape Your Future with AI & Infinite Knowledge...!! Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! In today’s digital-first world, agility and automation are no longer optional—they’re essential. Companies across the globe are rapidly shifting their operations to the cloud to keep up with the pace of innovatio...

📊 Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd

  Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd Monitoring and logging are essential for maintaining a healthy and well-performing Kubernetes cluster. In this guide, we’ll cover why monitoring is important, key monitoring tools like Prometheus and Grafana, and logging tools like Fluentd to help you gain visibility into your cluster’s performance and logs. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction In today’s fast-paced cloud-native environment, Kubernetes has emerged as the de-facto container orchestration platform. But deploying and managing applications in Kubernetes is just half the ba...

🔒 Kubernetes Security – RBAC, Network Policies, and Secrets Management

  Kubernetes Security – RBAC, Network Policies, and Secrets Management Security is a critical aspect of managing Kubernetes clusters. In this guide, we'll cover essential security mechanisms like Role-Based Access Control (RBAC) , Network Policies , and Secrets Management to help you secure your Kubernetes environment effectively. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction: Why Kubernetes Security Is Non-Negotiable As Kubernetes becomes the backbone of modern cloud-native infrastructure, security is no longer optional—it’s mission-critical . With multiple moving parts like containers, pods, services, nodes, and more, Kuberne...