AWS EBS Performance – Confused?


Updated on May 15, 2018

AWS EBS performance is bit of a mystery to many customers, especially to those not familiar with many factors that impact storage performance. Often customers provision PIOPS EBS volumes to achieve guaranteed performance and realize that their applications are not seeing the performance they expect to see.

This blog attempts to provide some clarity into some of the factors that affect EBS performance. We will discuss only IO1 type EBS volumes in this blog.

Block size: Block size is the measure of amount of data written/read in a single IO request. Storage devices store data in units of blocks. They are also able to receive data in sizes of multiples of blocks (typically limited by a maximum that the controller – storage interface adapter such as Fibre Channel – can support).

IOPS: IO operations per second (IOPS) is a measure of how many IO requests can be completed by the storage device in a second. Typically storage device vendors use smallest possible block sizes (512 bytes for magnetic disks, 4K for SSDs) to measure IOPS (read on and it will be clear soon as to why).

Throughput: Throughput is the measure of the amount of data transferred from/to a storage device in a second. Typically stated in KB/MB/GB/s. For e.g. if a storage device can write 1000 blocks of 128K each, throughput is 1000*128K/s = 128MB/s.  (In other words 1000 IOPS at 128K blocksize). If 2000 blocks of 64K each can be written, throughput is still 128MB/s (2000*64K/s) (2000 IOPS at 64K blocksize). The same throughput is achieved with 32000 blocks of 4K each (32,000 IOPS at 4K blocksize). It is easy to see the relationship between throughput, IOPS and block sizes.

Latency, Queue depth: These are two other factors that determine storage performance, but we will not go into details in this blog (maybe in another blog). Wai Lam a good friend of mine and an authority on data storage has explained the impact of latency on performance in his blog.

Maximum throughput per volume:  In AWS, each EBS volume has a maximum throughput limit. See our blog on EBS cost analysis and AWS EBS types description for details. Currently max throughput per volume is 500 MiB/s for io1.

Maximum throughput per instance: Similarly there is a limit on maximum throughput per instance. See EBS cost analysis and AWS EBS types description for details. Currently max throughput per instance is 1750 MiB/s for io1 volumes.

Now, where does the confusion (or at least part of it) about EBS performance come from? I believe primarily from the following factors:

  1. Block size dependency on IOPS
  2. Network bandwidth limits
  3. Per volume/instance throughput limits

Block size dependency on IOPS

In case of provisioned IOPS storage (io1) AWS measures IOPS in 256K or smaller blocks (btw, certain sections of AWS website state 16K as the blocksize. But 256K is supposed to be the correct size of measurement). If you send down larger block sizes, AWS will break them down into 256K blocks for IOPS measurement. So, if you send a block of 512K, it is counted as 2 blocks.

So, if you provisioned an EBS volume with 100 IOPS, what AWS is offering you is a block device capable of handling 100 blocks, each maximum of 256K in a second. If your application sends down 512K blocks, your IOPS will be 50!  (In my experience not many applications use block sizes larger than 256K.

If you are expecting to see a specific IOPS performance from an application point of view, you need to know the block size used by the application and provision EBS volumes appropriately. In the above example, if your application uses 512K blocks and you want 1000 IOPS from EBS you should provision 2000 IOPS from AWS/EBS.

IO Consolidation at OS/Host level

Another factor that can also cause confusion is the fact the file system and block device layers could consolidate IO requests. For e.g. if you send sequential blocks of say 16k, Linux block device layer will consolidate them into 128K requests (unless direct io is used). This is done to improve performance as storage devices tend to perform better with larger block sizes (up to a certain size). Depending on where the IOPS are measured, this can cause confusion.

IO Consolidation at EBS Backend

This is the one of the least known factors about EBS performance. AWS EBS backend also will attempt to consolidate IO request when possible (for the same reasons OS/Hosts consolidate IO requests). However, CloudWatch metrics report IO stats based on the rates at which they are received by EBS system. This means CloudWatch metrics could show read/write IOPS metrics that exceed the provisioned capacity (EBS backend measures IOPS based on consolidated IO).  For e.g. if you provision an io1 volume of 200 IOPS, you can send IO at the rate of say 800 IOPS if you send down 64k sequential blocks!

Network bandwidth limits

It is important to note that EC2 instances access EBS volumes over network connections. EBS volumes can be accessed using dedicated networks (available on EBS-optimized instances) and shared networks (non EBS-optimized instances). EBS-optimized instances offer dedicated network connection to storage with throughput options from 500 Mbps to 4000 Mbps with a per instance maximum of 32,000 IOPS.

With instances that are not EBS-optimized, network traffic is shared by all traffic – storage and non-storage. Instance types determine the available network bandwidth.  AWS is somewhat vague about network bandwidth of instance types and categorizes them as ‘low’, moderate’, ‘high’ and ‘10Gb’ (See ‘Instance Types Matrix’ section here). Shared bandwidth can be an issue for applications that use significant network traffic such as web/clustered applications.

Bottom line is that your application performance is limited by the network bandwidth available to your instances. So, understand the network bandwidth capabilities of instances and select appropriate instances that match your application performance needs.

Per volume/instance throughput limits

As mentioned above there is a per volume/instance limit on throughput. A single io1 EBS volume can only do a maximum of 500 MiB/s. So, if you provision a 20,000 IOPS volume, and your application runs at 32KB, you are going to be disappointed to see that you will only see a maximum of 15,625 IOPS.

Some might think you can overcome this by using multiple EBS volumes to increase performance. But AWS has another limit on max throughput per instance. A single instance can only do a maximum of 1750 MiB/s, something people overlook sometimes. If you are not aware of this, that can be another cause for disappointment.

One needs to consider the per volume/instance limits and select instance types appropriately to ensure guaranteed EBS performance.

Performance is always a complex matter as there are many factors that contribute to performance issues. But we believe most of the performance issues can be addressed by understanding the factors described above and carefully planning/selecting appropriate resources capable of meeting application requirements.

Other EBS related Blogs
How to reduce the size of EBS Windows boot volume in AWS?
How to shrink EBS root volumes in AWS – with just one click
How to reduce AWS costs using machine learning driven EBS IOPS provisioning
AWS Elastic Volumes and FittedCloud EBS Optimizer
FittedCloud AWS EBS Optimizer for Docker Containers
An Open Source AWS EBS Cost Analyzer
Is it possible to use EBS gp2 instead of io1, achieve same performance and save 50%?
How to optimize AWS EBS using LVM and reduce cost
How to create thin provisioned AWS EBS volumes and save a ton!

FittedCloud Cloud Cost Optimization Solutions
FittedCloud offers machine learning based cloud cost optimization solutions that help customers reduce AWS spend significantly. Our current solution includes machine learning driven actionable advisories with click through actions for EC2, EBS, RDS, DynamoDB, ElastiCache, ElasticSearch, AutoScale, Lambda, etc. and full/lights out automation for EC2, EBS, DynamoDB and RDS. Our solution typically can save customers up to 50% of their cost on AWS. For more details, please visit