Reducing AWS Spend Using Machine Learning Driven EC2 Instance Type Switching

In this blog, we will present a new feature of machine learning based “EC2 Instance Type Switching” through which all possible underutilized instances can be identified and appropriate “fitted” instance types are recommended to help reduce wasted AWS spend. Users also have the option that instance type is automatically switched up and down without hurting the performance but with significant cost saving.

AWS EC2 Instance Families

AWS EC2 provides a wide selection of instance types that have various combinations of capacities in CPU, memory, storage, and networking, providing the user with the flexibility to choose the appropriate instance type for different applications. Currently, there are 13 Amazon EC2 instance type families available for users, which can be categorized as follows:

  1. General Purpose, including T2, M4 and M3.
  2. Compute Optimized, including C4 and C3.
  3. Memory Optimized, including X1, R4, and R3.
  4. Accelerated Computing Instance, including P2, G2, and F1.
  5. Storage Optimized, including, I2 – High I/O Instances, and D2 – Dense-storage Instances.

For each family listed above, users are able to choose up to 7 instance types with different capacities, named with a suffix. For instance, T2 family includes t2.nano, t2.micro, t2.small, t2.medium, t2.large, t2.xlarge, and t2.2xlarge. For more details, please refer to this link: https://aws.amazon.com/ec2/instance-types/.  Of course, AWS charges differently for different instance types. Fittedcloud has identified that most Cloud environments are over-provisioned, i.e., a very low utilization of most EC2 instances. To check and measure your Cloud resources/services efficiency, interested readers are referred to one of our previous blog: Using Efficiency Metrics to Monitor Public Cloud Costs (https://www.fittedcloud.com/blog/using-efficiency-metrics-to-monitor-public-cloud-costs/)

Automatic EC2 Instance Type Switching to Reduce Your Cost

Your needs change for different time periods. For example, there may be high resource demands from 8AM to 10AM every week day and very low resource utilization during the midnight or weekends. To avoid over-utilization (instance is too small to fit overloads), many public Cloud users prefer to purchase a large instance, thereby leading to many under-utilization cases, such as midnight or weekends.

Public Cloud providers, including AWS, has the elastic property which enables us to avoid the over-provisioning issue. Fittedcloud recently released a new version with a new feature which allows users easily to switch their EC2 instances to appropriate ones which fit their demands at a minimum cost.

Machine learning based resource utilization prediction: We deploy machine learning algorithms to predict the utilization of all instances by constantly monitoring their performance and mining their usage patterns (e.g., midnight/weekends always have low utilization). In particular, various time-series machine learning algorithms, also known as regression algorithms, have been used, which are traditionally used in engineering (e.g., target tracking), economics (e.g., sale prediction and stock analysis), and health care (e.g., disease growth). Mathematically, these algorithms aim to predict one instance’s utilization Y[n0+n] at n seconds later based on its previous utilization measurements:

Y[n0+n] = f(Y[n0-1], Y[n0-2], …)

where f function could be linear or nonlinear which is learned from historical data. A powerful regression algorithm, such as the state-of-the-art long short-term memory (LSTM) recurrent neural network, can capture complex nonlinear temporal relations for a long time.

With the predicted utilization result for an instance, we know whether it is under-utilized or over-utilized by comparing its current instance capacity with other available instance types. FittedCloud solution will make recommendations to the users once potential cost reduction is identified. With automation, we also offer the option that fully controls instance type switching so that the provisioned EC2 instances always fit the workloads at the lowest cost.

Vertical vs Horizontal Scaling

In cloud environments, it is common to use ‘Horizontal Scaling’ to procure necessary resources to support applications. Horizontal scaling works well for cloud native applications that can work in a manner where application performance can be improved by adding new EC2 instances to the pool. Examples are front end web servers behind a load balances, Clustered data analytics applications such as EMR. However, for many simple single instance oriented applications that are not cloud native, horizontal scaling is not really an option. Customers usually provision instance type based on the peak requirements. On such applications significant cost savings can be achieved using instance type switching.

Case Studies

As a case study, assume that an EC2 instance “m4.4xlarge” is in use, which belongs to the instance family “M4” with the type of “4xlarge”. This instance type has 16 vCPUs, 64 SSD GBs, and 2,000 Mbps bandwidth. A typical resource usage pattern is shown in Fig. 1: there is a high utilization, staying 0.8 (= 80%) from 5:00 AM to 9:00 AM every day and a very low utilization 0.1 ( = 10%) beyond this time period. Instead of keeping the instance  “m4.4xlarge” in use for all the time, the new feature of Instance Switch can switch the instance type based on the prediction of utilization to significantly save the cost while satisfying the computing requirement. For instance, beyond 5:00 AM to 9:00 AM, three better choices at a lower cost for this instance: [‘m4.2xlarge’, ‘m4.xlarge’, ‘m4.large’] will be recommended to the user who can make decision to switch the instance to any of these three instance types for the specific time period.

Fig.1 An example of typical resource usage and its recommended instance types.

Users can accept the recommendation of instance switch, and FittedCloud solution will take care of the rest for automatic decision making. For example, the instance will be switched back to “m4.4xlarge” at 5:00 AM the next day. Notice that switching instance to some types may require a specific amount of time, which is also considered to enable precise and reliable instance switch.

Fig.2 AWS M4 instance family consisting of 6 types of instance

Please note that in this blog only CPU utilization is considered for identifying instance type switching recommendations. For applications that are sensitive to memory, disk and network utilization, those metrics will also need to be considered for a more accurate recommendation.

Other related blogs

About FittedCloud

FittedCloud is the industry’s leading public cloud resource optimization solution. It features machine learning algorithms that continuously analyze resource utilization and identify opportunities to reduce monthly recurring cloud infrastructure costs. Automated provisioning can adjust cloud resources according to load patterns, user-configured policies and other parameters. FittedCloud’s patented solution reduces costs up to 50% while eliminating complex manual provisioning processes and the risk of configuration errors. For more details, please visit https://www.fittedcloud.com