Insight and analysis on the information technology space from industry thought leaders.
Leveraging Cloud Automation to Manage Costs for Highly Unpredictable Workloads
Businesses should adopt measures to automate their cloud infrastructure as much as possible while maintaining the flexibility needed to handle unpredictable workload demands.
March 6, 2024
At first glance, using cloud infrastructure automation to keep cloud spending in check may seem simple — and it is, if your cloud infrastructure requirements are straightforward. For example, if your primary goal is to ensure that cloud servers can scale up and down based on workload requirements so you don't overpay for server capacity, configuring autoscaling rules for that purpose is pretty easy.
But sometimes, infrastructure automation needs are not so simple. If you're not sure exactly which types of cloud resources your workloads will require, for instance, setting up cloud automations becomes much trickier. As a result, you may find yourself struggling to determine when you're overspending on a workload and how to optimize workloads to achieve an optimal balance between cost and performance.
That's the challenge I'd like to address in this article. As I explain, there is no simple trick to enable effective cloud infrastructure automations when you're dealing with more complex workloads and needs. But there are steps businesses can take to ensure that they automate their cloud infrastructure to the extent possible while still providing the flexibility to accommodate unpredictable workload requirements.
Simple vs. Complex Cloud Infrastructure Automation
Let me begin by spelling out what I mean when I say that some cloud automation use cases are much less straightforward than others, and how this relates to cloud spending.
A simple cloud automation scenario is one where the following conditions are true:
You know exactly which types of cloud resources (such as cloud servers or databases) your workloads will require.
You have clear thresholds for the minimum and maximum infrastructure capacity that your workloads will require.
You expect workload requirements to fluctuate in a predictable and consistent way.
For example, imagine you're hosting an eCommerce app on a cloud virtual machine that experiences upticks in traffic during the holiday season. In this case, you can predict exactly how much cloud infrastructure capacity you'll need during peak demand based on historical data. It's also a safe bet that infrastructure requirements will return to their baseline level once the holidays are over. Based on this information, you can configure reliable autoscaling rules for your VM.
Compare that with a more complex cloud automation scenario, where you don't know which cloud resources you'll be using, what your capacity requirements will be, or how long the workloads will need to remain active. In this case, you can't simply set up some autoscaling rules and call it a day because you wouldn't know which thresholds to configure or how long to keep the autoscaling policies in effect.
In my experience, companies run into this scenario when they deploy special-purpose cloud workloads that need to operate for an extended but unpredictable period of time. For instance, a business might purchase extra cloud infrastructure capacity to support an ad campaign but not know when it will end the campaign or exactly how many ads it will need to support during the campaign. Or a company might be set to undergo a compliance review that requires an increase in cloud infrastructure consumption to support auditing and reporting processes, without knowing how intensive those processes will be or how long the review will take to complete.
Under-automating Means Overspending
Faced with complex cloud computing scenarios like these, companies could simply choose not to automate their cloud infrastructure at all, and instead scale it up and down manually.
The risk that they'd face, however, is that their cloud engineers may forget to scale down resources when they're no longer needed — or, worse, the engineers may not even know that a resource is no longer necessary. After all, engineers typically have limited visibility into what is happening on the business front. An engineer won't necessarily be alerted if, for instance, an ad campaign is winding down and the cloud database that she spun up seven months ago to support it should now be turned off.
As a result, attempting to manage infrastructure manually leaves businesses prone to overspending because they keep cloud resources running longer than necessary.
Making matters even worse is the risk that over time, unnecessary, unnoticed resources can affect baseline spending patterns. If you run superfluous cloud servers or databases long enough, the money you spend on them becomes part of what you consider your standard cloud budget. This makes it even more challenging to recognize that you're wasting money on the resources because they cease to look like anomalies.
A Nuanced Approach to Complex Cloud Automation Needs
Rather than attempting to manage infrastructure for complex cloud requirements manually, businesses should adopt a nuanced strategy that allows them to take advantage of cloud automations to the extent possible, while still providing the flexibility necessary to support unpredictable workload requirements.
I recommend doing this by tagging cloud unpredictable resources with labels that include two key types of information:
A date by which engineers should review the resource to determine whether it's still necessary. This gives teams an opportunity to assess infrastructure manually and figure out if they should terminate it. If they decide the resource needs to keep running, they can update the tag with a new "review-by" date.
A date by which the resource will be shut down automatically if no one has reviewed it. The "terminate-by" date should fall a certain amount of time — such as 30 days — after the review-by date, giving engineers a grace period to check in on the resource before it's automatically shut down.
When you adopt a tagging strategy like this and integrate it with cloud automation tools, you give your engineers the flexibility necessary to manage unpredictable workloads in bespoke ways. At the same time, the automated termination creates a backstop that prevents cloud overspending from happening beyond a reasonable period.
Going further, you can leverage data based on tags to inform your team about workloads that are not optimized for cost so that they can take action. When your cost monitoring and reporting tools detect anomalies for complex workloads, they can factor in data stored in tags to determine whether an optimization is necessary, then suggest an appropriate one if so.
To be sure, this strategy is not perfect. There's a risk that workloads that are still active will automatically shut down on the terminate-by date because engineers failed to review them first — but if that happens, your bigger problem is that your engineers are ignoring governance rules. Likewise, this approach may not shut down or scale back unnecessary workloads immediately, so you could still waste some money. But again, the waste will be limited by the automatic termination backstop you set up.
In addition, it's important to emphasize that no matter which cloud automations you have in place, you should still be monitoring cloud spending on a regular basis and looking for anomalies or opportunities to cut back. Automation is a complement to cloud cost tracking and analysis, not a substitute for it.
Conclusion: Taking Cloud Automation to the Next Level
The bottom line: You don't need to settle for automating cloud infrastructure management only in scenarios where your workload needs are simple and predictable. You can leverage automations to prevent overspending on more complex workloads, too.
Doing so requires a next-level automation strategy that provides more flexibility and nuance than simple cloud autoscaling rules — and one that helps you actively optimize your workloads, not just detect spending anomalies. But the time and effort necessary to implement advanced automation controls will pay for themselves many times over if they help you keep spending in check for unpredictable workloads.
Jason Foster is the vice president of customer excellence at Vega Cloud.
About the Author(s)
You May Also Like