Nomad_Soul - Fotolia
Published: 15 Jul 2014
The marketing pitch for cloud services includes a vision of a steady progression from in-house IT to the perfect, infinitely wise cloud that solves -- or at least outsources -- all of your IT problems. Every cloud vendor presentation includes some chart showing stages of customer evolution; higher levels of outsourcing invariably trend up and to the right. Buy infrastructure from the cloud (IaaS)? Good. Buy a full platform (PaaS)? Better! Buy your entire software stack from the cloud (SaaS)? Best!
Innovations routinely require evangelism to trigger mainstream user adoption. But while we admire the enthusiasm of cloud proselytizers, let's recognize that change doesn't always represent progress.
Now, I love cloud computing. I use it regularly, both personally and professionally. For developers, for startups establishing greenfield applications, for operators of services that need to scale very rapidly and without warning, for those requiring geographic reach for their data or apps and for many other use cases -- cloud has numerous, irreplaceable benefits.
However, many important IT attributes do not magically improve by outsourcing the operations. In fact, many attributes decline -- some, rather alarmingly.
Settling for cloud computing
The aforementioned concerns should lead IT to consider purchasing "elastic" cloud infrastructure. You can scale up to as many servers as you like, then back down, then up again -- varying your resources to your heart's content. But are any of those servers elastic? Can you vary the respective CPU counts, memory sizes or other resource allocations? Er…no. They're supposedly elastic, but they can't stretch -- not even in ways that systems running VMware, PowerVM or name-your-favorite-hypervisor find straightforward.
And then there are the guarantees you get from cloud providers and application program interfaces (APIs) regarding uptime, latency and transactional consistency. These often omit a clear service-level objective, and rarely include a hard service-level agreement -- they are just so incredibly weak. 99.9% uptime? That was a great goal, a decade ago, but a toothless "three nines" objective in 2014? Well-run enterprise shops blow past that in a heartbeat. Those who have been working on offsite business continuity and virtual infrastructure often think in terms of four, five or more "nines."
Or consider cloud APIs that record data. It's hard even getting solid target metrics. When you do, they often boil down to muddy generalizations like "most transactions post within seconds." Almost every commercial database system and data-handling middleware package you find on enterprise gear closes transactions within milliseconds, or sometimes microseconds or even nanoseconds, especially in scenarios adjusted for high transaction volumes, like HFT.
Because it's high-scale distributed computing, let's cut it some slack and say "a few seconds" is both precise enough and fast enough. Even under that standard, one has to wonder about transactions that don't post within a few seconds. Most cloud services embrace "eventual consistency," which means you'll have no clear idea of when the transaction will actually post. Minutes? An hour? More? It's not specified. I know of real-world cloud services that do occasionally post updates an hour "late"–and there are many brand-name services that post transactions out-of-order, minutes late.
With most of these APIs, there's no way to tell when late-breaking data arrives. You just have to keep polling, compare it to data you've already seen, and discern if anything new becomes visible. From the perspective of those used to solid, dependable two-phase commits, "eventual consistency" sounds a lot like "not consistent."
It's akin to the concurrency guarantees mainstream middleware offered 20 or 30 years back. But in the cloud world, that's not laughable, rather, it's state of the art. It's the price of admission for having a highly distributed, extremely scale-out-capable architecture.
I'm not saying these hyperscale designs are bad. Indeed, they're amazingly powerful and useful for a lot of interesting and valuable use cases -- building search engines and data heuristics, for example. They can sometimes process globally distributed updates within seconds, in a way that appears almost magical. For massive scale and minimal cost, you just can't beat them. But when you're transferring money, or want real control and predictability, or need guaranteed, low-variability performance, high-scale systems are often easily outstripped by much less distributed, "non-cloud," designs. Cloud and hyper-scale do some awesome things you really can't do any other way. Along the way they make real tradeoffs -- some of which rather surprisingly turn back the clock on attributes we've come to take for granted.