OpenStack scalability still tests the patience of IT pros

The constant evolution of OpenStack components hinders the platform's ability to scale. Learn why OpenStack scalability challenges persist, and what you can do to overcome them.

Scalability continues to be a big challenge for OpenStack users, as the open source cloud platform continues to evolve. OpenStack's purpose is to enable and operate large-scale cloud clusters, but some deficiencies remain when it comes to achieving that mission. Often, what works in small test sandboxes struggles to get beyond early production and into large-scale operation.

Increasing OpenStack scalability is not a simple process. For starters, many vendor plug-ins and tools for OpenStack are not scalable, while other vendors offer proprietary tools that can help with scalability, but introduce lock-in risks.

Breaking down the OpenStack scalability challenge

If you are purchasing a more turnkey platform, such as a vendor-managed OpenStack distribution, make sure your vendor can demonstrate viable scaling of the network across many racks of servers.

Networking seems to be at the heart of the OpenStack scalability issue, with OpenStack's Neutron module scaling to just 30 or so nodes. With public cloud competitors already running millions of VMs on tens of thousands of servers, this is a continuing red flag to enterprise customers considering OpenStack.

Part of Neutron's problem resides in the network models that the OpenStack Nova core module supports. These limit the size of a cluster, while adding delays to the build and teardown for VLANs. The result has been good enough for sandboxing, but is a major issue when production scale-out is needed.

A number of OpenStack partners, including Mirantis and Hewlett Packard Enterprise, have come up with potential solutions. But software-defined networks -- which promise to add the value of low-cost software to inexpensive, whitebox switches using commodity silicon -- are just coming to the market, offering some competition to Mirantis and other OpenStack partners.

Track how OpenStack releases have evolved through the years.
The many releases of OpenStack

If you are purchasing a more turnkey platform, such as a vendor-managed OpenStack distribution, make sure your vendor can demonstrate viable scaling of the network across many racks of servers. Ask to see a concrete description of the setup and tests.

Horizontal auto scaling -- or turning on more VMs -- using the OpenStack Heat module is also problematic when it comes to scalability. The Ceilometer module monitors workloads in VMs and triggers Heat to automatically expand or contract the VM count. Unfortunately, there are many distributions of OpenStack, with added, proprietary tools, and, in many cases, these are missing mainstream modules. Ceilometer is often left out, requiring a custom monitoring agent. The broad spectrum of OpenStack use cases almost guarantees that this interoperation will be buggy. The only option here is patience, as Ceilometer sees wider deployment.

Load balancing has similar problems. Neutron offers load balancing as a service, which Heat fully supports. Some distributions, however, miss that feature and it's necessary to look elsewhere. The open source HAProxy program from GitHub is one potential solution.

Solving OpenStack network problems

More advanced networking operations also are a struggle when it comes to OpenStack scalability. For example, connecting virtual network functions is a bit of a nightmare. Verifying the connections isn't easy and there is a risk of a missed connection on an externally facing function, such as a firewall. It's also difficult to insert new services into service chains; IT teams need to tear down the service string and rebuild it from scratch.

Startup storms -- which are somewhat reminiscent of the boot storms in VDI -- can occur if a connection is broken, then fixed. Reconnecting tens of thousands of nodes, using a slow encryption process and distributed agents, can be something to lose sleep over.

OpenStack at scale.

When it comes to OpenStack scalability, these network problems stem in part from the Nova, and also Keystone security, modules being bottlenecks. Careful tuning of Nova can reduce this issue substantially and speed up network setup operations. For example, IT teams could increase the number of NOVA API and conductor workers to remove bottlenecks in link creation. You can also address some Neutron problems with OpenContrail, an efficient, low-footprint platform that distributes network services.

OpenStack scalability and other challenges constantly shift as the OpenStack components evolve. Read blogs and articles to keep up to date, and remember that overall, the benefits of an open source cloud can be tremendous -- but the evolution is a major challenge.

Next Steps

See the evolution of OpenStack releases

Read what users have to say about OpenStack

OpenStack deployment presents steep IT learning curve

Dig Deeper on Open source cloud computing