Which AWS General Purpose Instance is the Most Performant and Cost Efficient?

Table of Contents

There are several public cloud providers that provide IaaS (Infrastructure as a Service) within the IT industry. AWS is one of those public cloud providers that provide virtual machines, this research aims to analyze the performance on AWS general purpose virtual machine offerings and compare the performance and performance between the different general purpose versions.

Introduction

AWS cloud services host many services, and its virtual machine, the IaaS (Infrastructure as a Service) platform, is called Amazon Elastic Compute Cloud or EC2 for short. EC2 has a wide range of different virtual machine configurations, each with specific characteristics and capabilities. Usually, these are directly linked to the number of CPUs, Memory, Storage , and, in some cases, GPUs. In AWS EC2, virtual machine instances are used to differentiate, mainly by use case; examples are general purpose, compute-optimized, memory-optimized and more. Each virtual machine instance type has varying sizes: .large, .xlarger, 2xlarge, 4xlarge, etc.. These sizes dictate the processor, memory, network, and disk configuration, for instance, an m5.large instance.

m5.large:

  • 8GB Memory
  • Intel Xeon Platinum 8175 - 1 CPU Core, 2 x vCPU
  • 0.75 Gbps Baseline Network Bandwidth / 10 Gbps Burst
  • 650 Mbps Baseline Storage Bandwidth
  • 81.25 IOPs @ 128KB / 3600 IOPs @ 16KB

The M series has been selected for this research, generally this series of instances is well balanced, there are no constraints set on the instance configuration such as burstable CPU performance. Neither is the instance type geared specifically to being memory optimized (R series) or computer optimized (C series). Those instance configurations are more geared towards workload where faster memory or compute is required respectively. The aim is to understand which M series instance will provide the best performance in relation to cost. These General-purpose instances feature a balanced CPU-to-memory ratio, making them ideal for a variety of use cases, from web servers to VDIs.

Understanding the difference in instance versions

Before diving into the results, let’s break down the variation in the different instance configurations. Each instance type has a different processor type or variation. This is denoted with a letter after the model number M5a, with the outlier being the base model M5. In this research, the M series instances, specifically the large size, are being reviewed.

For a full explanation of the naming convention of AWS instances, see the below link: Instance Type Names on AWS

Specifications for M instances in scope for testing:

Instance Type Processor vCPUs CPU Cores Memory (GiB) Network Baseline/Burst Bandwidth (Gbps) Disk - Baseline/Maximum bandwidth (Mbps) Disk - Baseline/Maximum throughput (MB/s, 128 KiB I/O) Disk - Baseline/Maximum IOPS (16 KiB I/O)
m5.large Intel Xeon Platinum 8175 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m5a.large AMD EPYC 7571 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m5ad.large * AMD EPYC 7571 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m5d.large * Intel Xeon Platinum 8175 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m6a.large AMD EPYC 7R13 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m6i.large Intel Xeon Ice Lake 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
M6id.large * Intel Xeon Ice Lake 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m7i.large Intel Xeon Sapphire Rapids 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00
m7i-flex.large Intel Xeon Sapphire Rapids 2 1 8 0.75/10.0 650.00/4750.00 81.25/593.75 3600.00/18750.00

* The d in the name means that the storage connection has less latency

Source: https://docs.aws.amazon.com/ec2/latest/instancetypes/gp.html

For more detailed information and to compare different AWS instance types, a valuable resource is https://instances.vantage.sh where you can explore and compare instance types and view their specific details.

Not all instances are available in all regions, you can cross-reference which instances are available in which regions here:

Amazon EC2 instance types by Region - Amazon EC2

Setup and configuration

This research explores the differences in performance and cost between the available general-purpose instances. The following instances are included:

  • m5.large
  • m5a.large
  • m5ad.large
  • m5d.large
  • m6a.large
  • m6i.large
  • m6id.large
  • m7i-flex.large
  • m7i.large

The instances are deemed to have similar specifications and will be tested using the same workload. The time taken to achieve parts of the workload will be recorded alongside performance metrics to determine which instance performs the best. All testing took place in Europe (London), in the eu-west-2 region.

Testing Methodology

The methodology used is different compared to our GO-EUC standard using LoadGen. The goal is to have a benchmark that can run independently without any required infrastructure to compare the differences in computing. As multiple solutions were explored, it was decided to create a bespoke compute benchmark. This method does not simulate a user workload but focuses primarily on testing the compute capabilities. This method has been used in other research topics in the past.

The benchmark used in this research is written in PowerShell, utilizing fsutil and 7zip to generate load using file writes, reads, and compression. The flow is as follows:

  • Create a file of a specific size
    • The following file sizes are used: 4k, 16k, 32k, 128k, 512k, 10MB, 100MB, 500MB
  • Copy this file x number of times
    • 4k – 1000
    • 16k – 500
    • 32k - 250
    • 128k – 150
    • 512k – 100
    • 10MB – 50
    • 100MB – 25
    • 512MB – 15
  • For each of the above file sizes
    • Read the contents of all these files
    • Compress all these files using normal compression
    • Decompress the archive created using normal compression
    • Compress all these files using maximum compression
    • Decompress the archive created using maximum compression
    • Compress all these files using ultra-compression
    • Decompress the archive created using ultra-compression
    • Remove all files used

Compression and Decompression is being used because these tasks are CPU intensive tasks. As the variation between the virtual machine SKUs is the CPU model and speed, we derive a tangible difference in processing time between virtual machine SKUs.

For each step, the timings are measured, which are:

  • Copy Files
  • Read Files
  • Compression_Normal
  • Compression_Maximum
  • Compression_Ultra
  • Decompression_Normal
  • Decompression_Maximum
  • Decompression_Ultra
  • Cleanup Files

These tests were run on each instance ten times. The results were averaged out for each test type. There were no outliers in the results, all test runs were relatively close to one another.

At the end of each test, the VM is stopped, the VM is then started back up before the next test run. This ensures a high likelihood that the VM will be relocated to a different physical server and/or rack within the data center.

Hypothesis & Result

Based on the specifications, there are changes in the CPU types; in some instances these changes are quite major, but overall, a performance improvement is expected with a higher version instance. Additionally, there will be an expected cost difference, as the newer instance versions will have different prices. Generally, new instance versions are cheaper to run, and the lower price is used as an incentive for consumers to migrate to new hardware and allow for the decommissioning of old physical hardware within data centers.

Analysis

Timing the different stages of each test gives us an idea of the average time it takes each VM instance to complete the various stages.

The data shows that the M7 instances have a slight speed advantage even though the overall specification is the same with 2vCPU and 8GB of memory. All test methods leverage the advantage of the CPU being a faster CPU/Higher generation for the M7 instances. When reviewing the other instance test results, there are some unexpected outcomes, namely the M6a instance, which is not the latest instance but performs faster than most others.

As explained in the introduction, each instance has a different CPU model. This variation in instance specification causes this variation.

During this research, the different CPU models allocated to the instances were noted (see below).

This table shows which processor model was allocated to each instance test and how many times the specific processor was used within the test iterations.

Instance Processor Name Count vCPU ClockSpeed
m5.large Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 9 2 2500
  Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 1 2 2500
m5a.large AMD EPYC 7571 10 2 2200
m5ad.large AMD EPYC 7571 10 2 2200
m5d.large Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 4 2 2500
  Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 6 2 2500
m6a.large AMD EPYC 7R13 Processor 10 2 2650
m6i.large Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz 10 2 2900
m6id.large Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz 10 2 2900
m7i.large Intel(R) Xeon(R) Platinum 8488C 10 2 2400
m7i-flex.large Intel(R) Xeon(R) Platinum 8488C 10 2 2400

Processors descriptions and clock speeds (max clock speed) are taken directly from msinfo during test runs

The M5 and M5d instances were allocated a slightly different processor model for several of their tests, whereas the remaining instances were allocated the same processor for all tests.

Reviewing the processor speed for each instance test reveals the number of tests run on a slightly slower CPU. These details are pulled directly from the virtual machine before tests are run. What do the performance metrics say? Do they tally with what we are seeing?

Noticeably, the m7i.large and the m7i-flex.large instance processor metrics stop long before the other instances on the test because the workload is processed faster. However, there is a similar pattern of CPU usage between all instances just offset where some process the workload faster than others. One outlier: the m5.large is the slowest to complete the test, which is reflected in the processor usage data. The m6a is just slightly faster than all the other m6 instances.

The disk queue length shows the same pattern overall as the CPU. Again, the data shows that the m7 disk queue metrics stop long before the other SKUs. There is a higher queue length at the end of the test, which is related to the cleanup process where all the files are deleted. A similar pattern is present between all instance types, which is expected due to the identical storage profile for all instances.

Also, the total timing of all measurements combined shows that the m7 instances are the fastest. In general, the m6 instances are not far away from finishing the workload in the same amount of time as the m7 instances.

Costs

Costs are always contributing when running virtual machines in public cloud environments. Selecting the “correct” instance for the use case is crucial.

In this research, costs are considered a secondary factor. However, reviewing costs is still important. Let’s break it down by showing the instance cost per hour.

The price per hour is different than expected.

  • The m6i and m5 instances are identical in price.
  • The m6id is 3% more expensive than the m5d.
  • The m6a instance is 0.5% more expensive than an m5a instance.
  • The m7i instance is 3% more expensive than the m6i instance.
  • The m7i instance is 3% more expensive than the m5i instance.

The biggest jump in price is for the m7 instance, which is as expected due to that instance using the latest hardware.

The m5 instances can be assigned a slightly different processor model based on the above processor table, while the m6 instances have a slightly newer processor model dedicated to each instance. The total cost can be calculated based on the total running time and the retail price per instance.

Again, there is a slight surprise in the total cost of the workload: the m6a.large is the least expensive instance from a workload completion standpoint.

When comparing the workload timing, we can see that the m6a.large is in the middle of the run when it comes to workload completion. Combining this with the cost makes the m6a.large the best-performing m series instance for the price.

Instance Type Duration (Min Avg)
m5.large 11
m5a.large 10.2
m5ad.large 10.9
m5d.large 10.5
m6a.large 9.1
m6i.large 9.2
m6id.large 9.1
m7i.large 8.8
m7i-flex.large 8.9

Conclusion

In AWS EC2, instance series are linked to specific characteristics or capabilities of the particular offering. Instances are related to different CPU/Memory/Storage and Networking types with varying constraints and are described on the AWS website.

It is essential to understand that when selecting a specific instance, it will come with different types of CPUs. Each time you stop and start an m5 instance, it could be equipped with a different CPU. Depending on your workload, selecting the correct instance and version is essential to ensure consistency.

Based on this research, the m6a.large will undoubtedly give the best performance for your money compared to the other versions of the m-series instances.

For the total run time of the test, the below table shows the price percentage difference between all instances and the m6a.

Instance Difference
m5.large 28%
m5a.large 32%
m5ad.large 46%
m5d.large 34%
m6i.large 7%
m6id.large 20%
m7i.large 5%
m7i-flex.large -6%

You may notice that the m7i-flex instance is cheaper than the m6a, this has been discounted due to the fact that the m7i-flex instance is not designed to utilise full CPU resources all the time.

The M7i-Flex instances are a lower-cost variant of the M7i instances, with 5% better price/performance and 5% lower prices. They are great for applications that don’t fully utilize all compute resources. The M7i-Flex instances deliver a baseline of 40% CPU performance, and can scale up to full CPU performance 95% of the time. Certain workloads may be more suited to the M7i-Flex, this workload was well suited but if more iterations were ran we may see different results.

If you use the m5 series, consider upgrading to the m6 or m7 series to ensure reliable performance when stopping and restarting your instances, or you are simply wasting money.

Photo by Matthew Delivera on Unsplash