How To Significantly Tame The Cost of Autoscaling Your Cloud Clusters

How To Significantly Tame The Cost of Autoscaling Your Cloud Clusters

Hi everyone. My name is Heidi Carson and I’m a product manager here at Pepperdata. Today, I’m going to share a bit about how you can tame the cost of autoscaling your cloud clusters.

Learn why Enterprise clients use Pepperdata products and Services:

#autoscaling #pepperdata #bigdataoptimization

As you may well be aware, the incredible flexibility and scalability of the public cloud make it an appealing environment for modern software development. But, that same flexibility and scalability can lead to runaway costs when the cloud doesn't scale the way you might expect.

Let's start with the largest cloud provider out there, Amazon Web Services. AWS offers several options for auto-scaling so you can automatically increase or decrease cloud resource usage according to parameters you set when you start up a new cluster.

This allows you to better match load and resources and saves you money. Auto-scaling your cloud clusters can offer significant advantages in that you don't have to pay when there's no work to do. It's great for solutions where demand ebbs and flows and yet is also somewhat predictable.

And particularly when workloads are the most wasteful looking at some of the customer workloads we have seen that YARN is relatively inefficient in being allocation-based and not usage-based.

And Spark, which is of course an extremely popular analytics engine for big data on YARN, is also relatively inefficient. So, is autoscaling the ultimate answer to controlling costs and waste in the cloud?

Not necessarily. This diagram shows conceptually what is going on when you enable autoscaling on a public cloud provider. Consider an example where a 12 core node A is created.

Let's suppose that you then start up an application that consumes 10 cores of node A. What often happens is that as that application consumes more cores of node A, the cloud provider will spin up another 12 core node B in anticipation of all 12 cores of node A being used.

As a result, at that moment when node B is created, there are 14 cores out of 24 being unused. Two from the original node A and 12 from that whole new node B.

Yes, capacity has been added as needed, but it can be added in chunks that are larger than what you really need. Here's a visualization of that same effect. This time from one of our Pepperdata customer environments.

The top graph shows a cluster growing to 100 nodes for the duration of an application's runtime. The reality of what is going on inside that cloud environment is a lot burstier and more dynamic than that, as you can see in the bottom graph.

Each node is running a single task and then zero tasks for the duration of the runtime. This means that almost that whole red tabletop of node resources in the top graph is unused.

Of course, there's no way that even the most dedicated engineers ever going to be able to stay on top of this and tune all of their applications to perfection.

The sheer volume and velocity and variety of applications most people run in the cloud these days are far beyond manual management. Our experience has been that manually tuning applications can even reduce overall ROI due to the wasted time and effort in doing so. And if you get it wrong, which is easy to do, then compute resources are wasted too.

Some of the underlying causes of why autoscaling can fall short are that YARN reservations are used as a leading indicator of utilization per server instance. Kind of like that example I shared earlier where a node was added in anticipation of needing it, whether or not it was actually used. A second cause that we've seen among our customers is pending containers.

Again, this metric is tied to the percentage of YARN memory allocated, which may or may not reflect actual utilization. And even when these are right, the scaling increments themselves can be too large and not responsive enough to the changing weather conditions on your cloud.

We have seen these challenges time and time again with our customers and we're working on solutions. Capacity Optimizer is our algorithm that layers on top of the native cloud autoscaling to allow you to run your same workload on fewer instances.

It can then optimize the number of containers and reduce the backlog. To learn more about how you can tame autoscaling costs on your cloud cluster, please contact us at Thank you.

Learn why Enterprise clients use Pepperdata products and Services:

Check out our blog:


Connect with us:
Visit Pepperdata Website:
Follow Pepperdata on LinkedIn:
Follow Pepperdata on Twitter:
Like Pepperdata on Facebook: