Achieving significant cost savings using machine learning in elastic public clouds – A cost savings analysis

For many years, traditional information technology (IT) departments are required to build and maintain IT infrastructures (e.g., servers, networks, etc.) and services (e.g., data base management, email, technique issue solving, etc.) to support the company’s business. It is also known that IT departments usually face serious issues, such as outdated infrastructures, underutilized and/or overutilized capacities, high cost, etc. The advent of elastic public Cloud offers a new and better (more scalable, reliable, and efficient) way to fulfill various IT needs, and we have seen a number of shifts of IT services towards public clouds. However, many public cloud users just treat cloud as a traditional IT resource/service provider, and they do not fully utilize the elasticity of clouds. In this blog, we discuss the idea of dynamic provisioning in public clouds and analyze the cost savings you can get using our machine learning techniques.  

Dynamic Provisioning
Traditional IT models, as well as most clouds with a fixed capacity, have two common problems: overestimated system utilization and underestimated system utilization, as illustrated in Fig. 1. While an overutilized capacity will result in a under provision and lead to a loss revenue or loss users, a underutilized capacity will result in a low utilization and produce unused resources.                    

Figure 1. overutilized (left) and underutilized (right) capacity

In elastic public clouds, one of most important properties is elasticity and scalability, which enables cloud users to quickly adjust the capacity according to their demands, also known as “dynamic provisioning”. Cloud resources should be provisioned dynamically to meet daily and seasonal demand variations and burst demand for some extraordinary events, e.g., Black Friday.

At FittedCloud, we apply machine learning algorithms to model daily and seasonal demand changes and regular events over time so that we can predict how demands change in the near future. Based on the prediction, we automatically and dynamically adjust cloud capacities, and the cloud cost can be minimized without any performance loss, as illustrated in Fig. 2.  

Figure 2: Dynamic provisioning to reduce cloud cost using machine learning

Cost Savings Analysis
Next, we would like to analyze how much cost savings we can get with the dynamic provision scheme. As a case study, we assume that the demand follows a sine function d(t) = A/2 * sin(pi*t/12) + A/2, which varies between 0 to A with a period of 24 (hours), and consider two cases: a fixed and underutilized capacity, saying 1.2A, and a dynamic provisioning capacity which has a function of c(t) = A/2 * sin(pi*t/12) + 1.2 A/2. It can be easily shown that, with dynamic provisioning, we can save r  in one day, compared with a fixed capacity which has a total of of 1.2 A * 24:

r = (cost for fixed provisioning – cost for dynamic provisioning ) / cost for fixed provisioning * 100

  = (1.2 A * 24 – \int c(t) dt) / (1.2 A * 24) * 100 = 50%

In other word, the dynamic provisioning solution can save 50% of the total cost.

Figure 3: Another example of dynamic provisioning and cost reduction

In Fig. 3, we also show another example of dynamic provisioning with FittedCloud, where one EC2 instance has a periodic demand: A for a time period of WA, and then B for a time period of WB. Assume that a fixed capacity provisioning always provides a capacity of 1.2 B to meet the demand. A better solution, as shown in the right figure, is the dynamic capacity provisioning which has the savings:

r = [1.2B * (WA + WB)  – (1.2B * WB + 1.2A * WA)] / 1.2B * (WA + WB) * 100

= (B-A) * WA / (B * (WA + WB)) * 100

Given that B = 0.8, A = 0.1, WB = 8 hours, WA = 16 hours, we have the savings r = 58.33%.

Actually, for many practical Cloud environment, our machine learning solution with dynamic provisioning can save more than 50% of the total cost.

In order to achieve cost savings resource provisioning needs to be modified on a frequent basis. As you can see in simple examples above, it is not easy to manually change provisioning on a regular basis. To make matters a bit more complicated, public cloud providers such as AWS impose constraints on how often resource provisioning can be changed. While increasing capacity is allowed without limitations, decreasing capacity provisioning has constraints in many cases and depends on the type of resources (For e.g. EBS IOPS capacity can only be changed once every 6 hours. DynamoDB read/write capacity provisioning can only be changed 4 times a day).  Automation that drives the provisioning based on prediction and that considers various provisioning constraints is essential to achieving true resource optimization and cost reduction.

FittedCloud Cloud Cost Optimization Solutions
FittedCloud offers machine learning based cloud cost optimization solutions that help customers reduce AWS spend significantly. Our current solution includes machine learning driven actionable advisories with click through actions for EC2, EBS, RDS, DynamoDB, ElastiCache, ElasticSearch, AutoScale, Lambda, etc. and full/lights out automation for EC2, EBS, DynamoDB and RDS. Our solution typically can save customers up to 50% of their cost on AWS. For more details please visit