How to reduce AWS cost using machine learning driven DynamoDB provisioning


Amazon DynamoDB is a fully managed NoSQL database service that provides extremely fast and predictable performance with seamless scalability. There is no minimum fee, and customers pay a flat, hourly rate based on the provisioned capacity. For example, for US East region, it costs $0.00065/hour for one Write Capacity Unit, and $0.00013/hour for one Read Capacity Unit. See Amazon DynamoDB Pricing for more details. Therefore, provisioned throughput capacity optimization in Amazon DynamoDB is of great importance to the customers as it determines the total cost of the database service.

Provisioned throughput capacity optimization can be classified into two categories. The first focuses on the design of application and tables to achieve optimal usage of a table’s provisioned throughput. See “Guidelines for Working with Tables” for more details. The second one focuses on adjusting the provisioned throughput capacity based on how much the customer needs. Because customers pay for how much they provision, not how much they consume, one can easily waste a lot of money if the provisioned capacity is underutilized.

This blog falls into the second category. Next, we present how FittedCloud uses machine learning driven approach to optimize provisioned throughput capacity.

Machine learning driven Provisioned Throughput Optimization

There are two challenges in provisioned throughput optimization: 1) accurate prediction of consumed capacity, 2) adjusting the provisioned capacity in a timely manner. While it is possible for someone to monitor the consumed capacity from a few tables and adjust the provisioned capacity accordingly, this solution is not scalable. One can easily lose track of the consumed capacity when DynamoDB scales up, and therefore adds up unnecessary cost.

FittedCloud uses a machine learning/data-driven approach to solve the problem. It consists of two major parts:

  • Prediction of consumed capacity
  • Optimization of provisioned capacity

The first part, prediction of consumed capacity, focuses on accurately predicting consumed read/write capacity from historical data. We use machine learning based on a time series model to learn patterns from historical consumed capacity, and predict the next consumed capacity, which tells us how much we need to provision.

Once we have the predicted consumed capacity, the next challenge is to determine when to decrease the provisioned capacity. This is because Amazon only allows the customer to decrease the read or write provisioned capacity no more than 4 times per table in a single UTC calendar day, while there is no limit on the number of times to increase the provisioned capacity. See Limits in DynamoDB for more details. This becomes an optimization problem as we need to carefully determine when to decrease the provisioned capacity to minimize the total cost. There are two conditions we need to consider: 1) the limit on the maximum number of decreases, 2) write capacity unit is 5 times more expensive than read capacity unit. We have developed a greedy algorithm to solve this optimization problem by finding the best times to decrease the provisioned capacity given the above two conditions.

An example is shown below. We can see that the machine learning module can accurately predict the capacity. Note that because of the limit, we can only decrease the provisioned capacity at the times which will minimize the total cost.