everythingpossible - Fotolia

Get started Bring yourself up to speed with our introductory content.

A guide to cloud capacity planning for unexpected spikes

Unpredictable traffic spikes can make or break applications. Make sure your apps can handle scaling demands with this breakdown of cloud capacity strategies.

One of the biggest advantages of the cloud is its near infinite scale. Unlike on-premises infrastructure, cloud computing resources can expand to accommodate nearly every IT scenario imaginable.

That scalability makes cloud computing uniquely equipped to power applications and businesses that experience sudden, unexpected spikes. But scaling resources is a complex matter that requires proper cloud capacity planning so you can serve your end users without overspending.

IT teams need to architect applications to handle fluctuations in scale. They also must properly design the supporting infrastructure, so unused capacity isn't sitting idle -- but is still available when needed. Here are some ways to plan for capacity challenges so you're ready when these spikes occur. 

How cloud scaling works

You can scale an application -- and its underlying architecture -- in a few different ways to accommodate capacity fluctuations. To better grasp how to deal with unexpected scaling, we must first establish how cloud scaling works, in general.

Let's use an analogy. Think of your cloud environment as a bathtub. The tub basin represents the resources available for hosting an application, and the drain represents the app's ability to process incoming requests -- i.e., its performance. Now, let's say you turn on your faucet, and keep it on. Your drain needs to remove the water as fast as it comes in, or else you'll soon have a big mess on your hands.  

The same goes for your application. You want it to process data as quickly as it comes in to avoid any slowdowns or failures. This equilibrium represents the expected capacity of the application. When the load becomes too much to handle, it can be difficult to recover without turning off all incoming requests.

To handle increased flow, cloud admins can scale vertically by adding more capacity to their VM. Also known as "scaling up," vertical scaling is a simple, yet heavy-handed approach to dealing with increased load. It's good for a predictable, consistent increase, but it doesn't address unexpected spikes that exceed that threshold. To go back to our tub analogy, a bigger basin doesn't necessarily solve the overflow problem, it just buys time.

Alternatively, you can scale your application horizontally. Also known as "scaling out," your application respond to traffic increases by adding more instances, or nodes, to your resource pool. This design works in distributed systems, with load balancers typically used to route traffic accordingly.

At some point, you will reach your limits with vertical scaling -- either financially or physically. Instead, use horizontal scaling to solve the overflow problem more effectively, with smaller, lower-cost resources used in larger quantities.

a visual description of how load balancing works

Use cloud services, containers and serverless for adaptive scaling

Cloud application scaling is a 24/7 process that cannot be done manually. At a certain point, you won't be able to keep up with demand and your app will fail. And even if you do keep up for the most part, unexpected traffic spikes will always catch your IT team off guard.

This is where adaptive, automated scaling helps to handle spikes and save money. Adaptive scaling comes in a few different forms.

Built-in proprietary scaling

Most cloud providers offer some native auto scaling functionality, such as the vertical scaling of database storage or the horizontal scaling of compute resources. With adaptive scaling provided as a service, users can "set it and forget it," while they work on more targeted tasks. This is a good place to start, but there is a risk of vendor lock-in. If you rely on multi-cloud or hybrid infrastructure -- or intend to in the future -- proprietary scaling may not be the best option.


Containers are an excellent way to package an application -- dependencies and all. Containers are ephemeral in nature, so they operate on the assumption that the available resources might not always be there. This reduces the need for things like fixed storage and enables users to more easily scale out an application.

Cloud admins use Kubernetes to add containers as load increases. When these orchestration capabilities are combined with a cloud provider's scaling features, both your application and the infrastructure it runs on can scale to accommodate just about any need.


In this model, a developer uses functions to handle specific tasks, and the cloud provider manages the servers and resource allocation. You only pay for the resources consumed, but make sure your serverless functions are small and efficient. Otherwise, the cost of too many long-running functions could be exorbitant.

Set up defensive scaling with queuing, circuit breakers and caching

Automated scaling is always a risk, no matter how cost-efficient an application is. Without any limits in place, a large enough traffic burst could break the bank. Going back to the tub analogy, sometimes you need to slow the flow to a trickle or stop it altogether. When that happens, you can use the following techniques to manage excessive spikes.


Use queuing to apply back pressure against inbound requests. This makes dealing with spikes more manageable, because it offloads work that doesn't have to be performed within the timeframe of the original request.

With queuing, you can break an application into smaller, focused services. That way, when the load increases on the front end, you can control how much data can be processed on the back end.

Circuit breakers

Circuit breakers are smart checks that halt requests that would add too much load to the system. For example, you could limit user functionality at a certain scale or apply request limits on a per-user basis. These limits can be communicated to end users so they're aware of the restrictions.


Caching is the fastest way to reduce load on an application. You will reduce your application processing load on a per-request basis if you identify repeat requests and cache the responses. This will not only make your application feel faster, it will also keep resource usage down.

No matter what methods you use to prepare for unexpected scale, you should always test your application and infrastructure against real-world scenarios. Whether you're looking to address database scaling, application performance or request latency, these sorts of checks can establish a culture of quality that's better suited to deal with the unexpected.

Dig Deeper on Cloud automation and orchestration

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.