Simple AWS EC2 scheduler vs Machine Learning driven EC2 scheduler

AWS EC2 power ON/OFF

Turning off EC2 instances when they are not in use is a simple and effective way of reducing AWS cost. It works especially well in environments such as development/test where EC2 instances are used only during work hours.  A simple 10-hour weekday schedule can help save 70% of the cost.

Many customers use home grown scripts (or Lambda functions by advanced Cloud users) to implement EC2 scheduling.  There are third party solutions including FittedCloud that offer SaaS based implementations that help customers set up and manage EC2 schedules. They are typically very easy to use, and a sample policy creation GUI looks like below.

AWS also offers an EC2 scheduling feature at a small cost to customers. It uses a CloudFormation template and resources such as DynamoDB, Lambda and CloudWatch. We love CloudFormation and these services, but AWS implementation is clearly meant for folks that have familiarity with these services. Schedules are driven by tags, so users will have to tag their instances appropriately as well. At the low end of the market we find that many customers don’t have the time or expertise to deploy them and hence depend on third party solution that are easier to use.

Other than easy deployment, it is also important to know when the instances are actually being used. We find that many applications outside typical dev/test environments also have patterns that can benefit from EC2 scheduling. One way to identify the usage pattern is to use CPU utilization data available in CloudWatch Console. One could analyze the graphs and come up with a schedule recommendation, as described in this blog.  It requires a manual analysis using multiple thresholds (max CPU utilization, min duration of inactivity) and consideration for noise to come up with a reasonably accurate recommendation.  To be even more accurate, one should also consider network traffic and disk IO activities.

Machine Learning Driven EC2 power ON/OFF

While manual analysis may work for a few instances, how can it scale if there are 100s or 1000s of instances? FittedCloud uses machine learning to identify when EC2 instances are actually being used and makes schedule recommendations.  Historical utilization information for the past few weeks can be fed into machine learning models to determine exactly when they are being used. Such models also use a probability of confidence – set by default at a very high level- to ensure high level of accuracy.  A sample of machine learning provided schedule looks as below.

Customers can simply select the schedule and instruct the software to manage the schedule. Compared to manual EC2 scheduling, machine learning driven EC2 scheduling simplifys EC2 power management and is more accurate and scalable.

Next Generation Cloud Optimization Automation

In our view ever increasing complexity in cloud environments mandate that next generation cloud optimization solutions must automate optimization actions using machine learning to ensure reduced cost, operational accuracy and efficiency. Such solutions should help customers identify resource utilization schedules accurately, run schedules with single click and continue to monitor utilization to ensure that the schedules adapt to changes to utilization patterns.