Calculating cloud computing TCO

One of the major reasons that enterprises are showing such interest in cloud computing is the promise of cost savings and possible economic advantages. Before embarking on a cloud implementation, however, it's important to find out exactly how much switching to the cloud will save you.

In this webcast, Bernard Golden, CEO of cloud computing consulting firm HyperStratus, discusses the value that cloud services can provide. He also offers several cloud infrastructure options as examples to study when calculating cloud's total cost of ownership (TCO).

For more from Bernard on cloud TCO, check out his cheat sheet on cloud TCO calculations.

Read the full transcript from this video below:

Calculating cloud computing TCO

Bernard Golden: Hello, and welcome to this webinar on Cloud TCO,
Figuring Out the Real Numbers, on Bernard Golden and we'll be talking today
about one of the most controversial and misunderstood questions about cloud
computing, which is the TCO, how to understand TCO around cloud computing
. It's clearly a topic that's very important, because one of the reasons
that companies look to cloud computing is because they believe the numbers,
the finances might work out extremely well. So we'll talk about that today,
talk about some of the opportunities and the challenges for that.

As an introduction, again, I'm Bernard Golden, CEO of HyperStratus. We're a
cloud computing consulting firm located in Silicon Valley. We provide a
range of services to our clients, like strategy, helping them decide what
their approach to cloud computing should be. Should they use a private
cloud, public cloud, what applications they should look at using first,
moving into the cloud, so forth and so on. We do architecture design,
helping companies when they decide to put an application into the cloud
environment, how they can build it to make sure that it's scalable,
elastic, takes advantage of cloud computing characteristics, and so forth.
We actually implement systems on behalf of our clients, or help them
implement them, and we provide training around cloud computing as well.

I serve as the cloud computing adviser for CIO Magazine and do a blog there
that's read pretty widely throughout the world. I'm the author of two books
that are germane to cloud computing. One is "Virtualization for Dummies,"
best selling book on that subject ever published, and also a new book
called "Creating the Infrastructure for Cloud Computing," published by
Intel Press.

So, today we're going to talk about cloud computing and the TCO of cloud
computing environments and really, the challenge for TCO today is that
it's becoming harder to figure out TCO. And one of the reasons is, if we
look at the next slide, is that in the past, the numbers were run on
applications assuming a very static environment. As a project was planned,
you'd say, "How much resource do we need? How much storage? Do we
need one server? Two servers?" Whatever it might be. "How many users will
we have? How big a box do we need?" And you'd be able to sort of scope
what the total cost structure was going to be.

It's very difficult, because we live in a very different world today, in
that applications tend to be much more sensitive, experience much higher
variation of loads. A way that you can look at it here is, if you think
about a bell curve, in the past we sized and analyzed the finances of
systems based on that they would be at the mean, at the very middle of the
normal distribution. And what's happened over the course of the last ten
years, as more systems become web-based, as they start generating data from
sensors, so forth, as we start reaching out to external populations and let
them interact directly with our applications, you can say that the standard
deviation of load changes quite a bit, which means that the TCO structures
have to be sound. You can no longer assume that you're going to have a
fixed set of resources, hardware, software, and so forth, that are applied
to a system, and that you can analyze it on that basis. And that creates
quite a challenge, and we'll be talking about some techniques on how that
can be addressed as we go forward in this webinar.

The main point of this slide is to understand that the very characteristics
of TCO are changing a lot, and that one of they key reasons that people
look at cloud computing is for exactly this reason here, with this
increased standard deviational load. Basically, as these applications
become much more unpredictable, as their loads vary quite a bit, people
say, "I've got to have an infrastructure that can support that. I need to
have resources that scale up and down as we go. So how can we accomplish

Cloud computing is one of the things that people think of when they think
about, "How do I build more scalable, elastic applications?" And it makes
sense. If we look at, on the next slide, the NIST definition of cloud
computing. NIST is the National Institute of Standards and Testing. It's a
US Government body that not only takes the lead within the government for
setting standards that are used by the government, but also is considered
kind of a bellwether for many institutions. Many companies, they look to
NIST. NIST has been very active in cloud computing, and has developed a
definition that is quite widely accepted about what cloud computing

Keeping in mind the question of the increased standard deviation, the
varying loads, the new challenges for applications, let's look at how they
define cloud computing, and why that might be relevant for understanding
why people are using the cloud, and then we'll use that as input into our
TCO analysis.

Basically, NIST defines five different characteristics of cloud computing.
On-demand self-service. A consumer can unilaterally provision computing
capabilities, such as server time and network storage, as needed,
automatically and one can easily understand that that would be very
applicable to applications that have very widely varying loads. It makes a
ton of sense to be able to say, "Oh, load is going up, I want to be able to
self-serve and have these resources automatically be added to my
application," without having to go through a long budget process or an
extended manual process of requesting resources and so forth. So, quite
germane to this question of varying loads and elasticity.

Broad network access: This really refers to the fact that applications are
now being accessed by a wide variety of devices and so forth, past a
desktop PC, probably internal to a company. Today, you've got people on
laptops, you've got people accessing things from over phones, you've got
sensors, a whole range of devices and again, that changes the
unpredictability of applications, and makes cloud computing make more

Resource pooling: The provider's computing resources are pooled to serve
multiple consumers, with resources dynamically assigned and reassigned
according to consumer demand. This is exactly what we were just talking
about. As the load varies, according to usage patterns of an individual
application, the cloud provider will add or subtract resources from that
application as needed, and it draws from a central pool. And essentially,
this allows the cloud provider to multiplex demand across a number of
customers, ensuring that any one customer will have access to the
resources he or she needs as the load on that application changes.

Rapid elasticity: Capabilities can be rapidly and elastically provisioned,
in some cases automatically, to quickly scale out, and rapidly released to
quickly scale in. In a way, this reinforces that same message, which is,
it's easy to add and subtract resources to respond to varying demand. As
you can imagine, trying to keep track of those resources -- in particular,
trying to keep track of their costs -- is a challenge that we're going to
be talking about today.

And then, measured service. Resource usage can be monitored, controlled,
and reported, providing transparency for both the provider and the consumer
of the service. This is kind of a "pay by the drink." I use resources as I
need them, as I access them, and I only pay for what I use. This is a huge
challenge for traditional TCO analysis, because it assumes static
resources. We have three servers. We start with three, we keep three. It
extends off for the next five years. So I could come up with a very static
analysis. It's much more difficult to assess TCO in this environment. We
have some techniques we'll be talking about.

The exciting part of this, of course, is that the application, as the load
varies and so forth, you only have to provision what it needs, and you only
pay for what it uses. And that's a big change from the past, the way things
were done, and it makes cloud computing very cost-effective. It also makes
TCO analysis a little more difficult, and so you have to sort of calculate
that in.

Well, this is all fine. We've talked about the NIST definition. We've
talked about the challenge of the standard deviation. Really, the question
is, how do the numbers stack up, and how do you go about thinking about
these things? Because in the abstract, cloud computing sounds great. The
question might be, how do I pay for it, and what are the costs going to be
in the near term?

And this picture of the stack of dollars really indicates, how do the
numbers really add up in the real world? Let's talk about that.

Look on the next slide to understand the TCO components. It's important to
recognize that there are sort of three different cost streams that come
into a TCO analysis and so, if we look at those, we have direct costs.
Those are the things associated directly with the application that it uses
as part of its delivery, classically. Server, storage, and some software.
You might be paying for the operating system. You might be paying for
software components that are used in that application. Those are all direct
costs for the application and they're relatively straightforward to

You have indirect costs. Network. So there's a network in place. How do you
assign a certain amount of that network to the particular applications, so
that you can provide TCO around that application? The same thing is true
for storage. You might have a certain amount of storage for that
application, but it resides in an array, or something like that. How do you
assign the proportion of cost that application makes up of that total
storage cost to that application?

Other kinds of software. You might have network management software in
place, network monitoring, security software. Things that reside on the
network. Those are investments that are used across IT, but some proportion
of them should be analyzed for the application to get a good idea of
whether the TCO stacks up, and so forth.

And then operations head count, of course. There are people who operate the
overall infrastructure. Some proportion of that is going to be assigned to
the resources, the direct resources, to run and manage the server, the
storage, the software, and so forth. How do you account for that?

And then you've got overhead. The application runs on servers, which reside
in the data center. There's overall bandwidth that the company or
organization has access to the outside world. How do you account for that?
Facilities. It's in a building. Those kinds of costs as well and then
admin headcount. There's typically people in the organization who are
responsible for keeping track of the accounting and so forth.

This graphic shows all these streams coming in and being split out. You can
really think of TCO as being almost in the reverse. You want to have each
of these streams be joined together at their appropriate numbers to come
out with one overall number. As you can imagine, some of these costs are
not that easy to track down.

So, in summary, if we look at the next slide, to perform a really good TCO
analysis, you have to add direct costs plus assigned indirect costs. In
other words, what portion of an ops person or several ops persons are
assigned to this particular application. And then the advertised overhead
costs. You've got a data center, it's going to have a viable lifespan of
ten years, perhaps, and the application makes up half a percent of that
overall data center, let's say. The number of servers within the context of
the total number of servers, it's a half a percent, so you're going to want
to assign half a percent over the course of the next ten years of the life
cycle of the application to that application's overall cost.

If we look at how that's done internally, on the next slide, really, this
picture depicts how challenging it is for most organizations to figure out
what their cost structure is. The challenges to do this internally are
often things around, there's no integrated budget. One group pays for the
servers. A different group pays for the storage. A third group, maybe real
estate or facilities. Part of the headquarters staff pays for the
facilities, you know, owns the data center. Power may be paid for by a
different part of the organization. Very challenging. There's different
organizations involved. How do you assign the costs?

Perhaps surprising, or not surprising, to you, it may not be easy to even
find out those costs. And then how do you assign them?

In some cases, there's user resistance to creating accurate TCO. As you
might imagine, given that there's so much opaqueness around understanding
these costs, it means that often times costs are assigned arbitrarily or
unfairly, and as you might imagine, there are some people who are harmed by
that. They pay for more than they really are using. So they're making a
sacrifice. By the same token, there are probably people who are using more
than they're paying for. They're getting, so to speak, a free ride. As you
can imagine, those people who are getting kind of the free ride or the
cheaper rates aren't necessarily going to be enthusiastic about getting a
more accurate cost assignment.

So, the bottom line is, assessing internal TCO can be very challenging. It
can be very difficult. It's critical. It's important. It's becoming more
important. But it's quite challenging. There's a lot of work been done
around this, but for the average organization, taking the concrete steps to
figure out those costs can be a bit challenging.

By contrast, look at the next slide, cloud costs can be much more
straightforward to understand, because if you look at these three arrows,
as you can see, at the bottom it says, "cloud service provider aggregated
costs." Essentially, the cloud service provider has taken responsibility
for figuring out what all those costs are. They've figured out what is the
cost per server, what's my power input, how many people do I have operating
it. They've analyzed that as a business, and then they've come up with
their total cost structure because, if you think about it, for them to be
able to run a viable business, they have to understand their entire cost

So they've already done that. They aggregate those costs, and then split
them out across these different kinds of offerings. And typically, this is
the way Amazon does it, and it's very typical for public providers, they'll
essentially, in the simplest case, provide network traffic, and they charge
you $0.15 a gigabyte, roughly speaking. It can vary a little bit. We'll
talk about that in a moment. They compute, how much you pay per processor
hour. And again, this is the least expensive tranche for Amazon. However,
that's for a small server. You can buy up to very large servers, and you
pay more per hour. And then sorts, the same thing. You pay so much per
gigabyte per month.

So you can see these are very transparent costs. I mean, it's very clear
what they charge for these things. And, in the case of Amazon and it's
fellow cloud service providers, often what you find is these have tranches,
meaning network traffic is sold 15 cents per gigabyte up to a certain amount
of traffic, and then the cost drops. So you're basically, as you buy in
volume, your overall cost goes down.

But it's quite transparent, the way they deliver these costs, in that they
break it out so they take out the responsibility of figuring out how to
aggregate all that. It relieves you of the headache of the equivalent
having to go and figure out how much is your power cost and how much should
I assign to this application. You just figure out, what does my application
use in this environment, and then the costs are quite transparent. You can
basically do a mathematical exercise.

The challenge comes when you run into that thing about the increased
standard deviation of load. We'll have some techniques around that by
atleast the inputs are quite clear and relatively straightforward to figure

Let's take a look at an actual TCO example. This is one that we created for
a client of ours and as you can see, it's basically got EC2 instances,
which are the virtual machines, the servers, the computes capability. We
have data transfer, which is the network traffic in and out and then we
have elastic block storage, which is storage capabilities and so, again,
these are the numbers, these are the different mechanisms that, in this
case, Amazon, breaks out as their pricing. It's that three different types
that we discussed.

And you can see that, in the yellow, there is the opportunity to put in how
many instances you're going to use. So, for example, there's "small” and
at the time that we created this for this client, the instances were $0.10
an hour. The prices have since dropped to about $0.85 an hour, that I
described in the earlier slide. And you can put in, this application will
have two small instances, or three large instances.

When it comes to data transfer, how much data am I going to be moving in
and out of this application? Is it going to be 30 gigabytes, 300 gigabytes?
As you can see, there's tiered pricing. If you look at the "out" section.
the first 10 terabytes, the next 40 terabytes, the next 100 terabytes, and
over 150 terabytes, and those all have different prices. They drop as the
number goes up. So depending on the amount of traffic, there'll be a
certain number, and it'll be based on a sum of the total amount of traffic
across the cost for that traffic.

Same thing with elastic block storage. The number of volumes, the number of
snapshots that you take from that elastic block storage, and so forth. And
really, this is a spreadsheet that you can then plug your expectations in
and it will generate a number and if you look down at the bottom, here, in
the bottom right hand corner, you can see that there's, it's labeled EC2
totals, it should be Amazon Web Services totals, because this is the total
cost for all of those resources across the assessment of what the load is
going to be. You can see that monthly it's $146, annually it's $1752, and
then when added with the rest of the cost, runs $167.52. Nothing like exact
numbers, here and $2010.24 per year.

And this is a way, what's really beneficial about this is, it's fairly
straightforward to get a very good idea of what the cost structure is going
to be for your application. As you figure out how many instances, how much
traffic, and so forth, you can come up with a pretty accurate number -- I
mean, here it's down to the pennies -- of what this is going to run. So
really, you avoid many of the challenges that you look at within creating
the TCO for an internal system.

Now, let's talk about a case study, an actual real-world example of this.
The Silicon Valley Education Foundation, we see here on Slide 11. And this
is an application put up by a client of ours we worked with, and we did a
pretty extensive TCO analysis on this as part of the project. This was an
application that was originally hosted, and then they gave it to us. I'll
talk about that and we did migration, but we also did the analysis, and
that's what I want to talk about.

So, first let's understand a little bit more about Silicon Valley Education
Foundation. Silicon Valley Education Foundation, as we look on Slide 12, is
a non-profit located here in Silicon Valley. Its purpose, or its charter,
is better education in Silicon Valley. Its job is to help the educational
institutions and students perform better, to learn more. It has a STEM
focus. STEM stands for science, technology, engineering, and math. And the
key issue that they identified, as they moved forward, was that, how could
they help teachers be better teachers? How could they help them to
collaborate so that they could teach better?

And at that end, we see around this circle here, ‘Lessonopoly’, and that's
the application that we were brought in to help analyze. And turning to the
next slide, Slide 13, this is a screenshot of ‘Lessonopoly’. And it's
basically a way for teachers to share lesson plans, which are kind of the
foundational document, or the foundational artifact for teaching particular
topics. A teacher creates a lesson plan that says, "These are the
objectives, this is what we'll use for resources, these are common
questions we'll ask," and so forth. It's kind of a recipe to teach a
particular topic.

The great idea is, let's let teachers collaborate, so they can help improve
these things, so they can share peer knowledge and so, ‘Lessonopoly’
was built with that in mind.

And if you look inside the circle here, a very interesting thing is this
science of the Olympic winter games, which was something that came about
because NBC, when it broadcast the Winter Olympics in 2010, developed a set
of lesson plans around science, technology, engineering, and math, and the
Olympics, to help teachers focus on this current event, this athletic
competition, which would garner a lot of interest from students, and then
tie it to topics around STEM subjects. So, for example, figuring out what
the physics means about the load on a skier's leg as he or she goes around
a slalom gate. Lesson plans built around that.

And NBC came and offered that to Silicon Valley Education Foundation and
its constituent school districts, and that posed a challenge for SVEF.
Obviously very desirable, made a ton of sense, was going to be very
interesting and compelling to students but the question was, how could
they support that increased load, because they figured that it would
increase the load a lot and we actually addressed that, both in the design
of the application and in our TCO. And we'll be talking about that.

The genesis of this overall project was that SVEF came to us and said,
"We've got this application. It's crucially important. It's used by all
these teachers. It's the kind of thing that teachers want to be able to
collaborate, typically outside of school hours, 24 hours a day, the
weekends, and we have it running on a single server, which means we're
exposed to hardware failures and if something goes bad with the hardware,
we can't do anything until it gets fixed. Nobody can collaborate until it
gets fixed. We're running on a single server. We're concerned about our

So they asked us to help them understand and analyze what their options
might be. And looking at Slide 14, what we can see is, we gave them
basically three options. One was the external hosting provider, which had a
single machine subject to outages and the challenge was, if they faced a
situation like this NBC Olympics type arrangement, to grow to handle that
additional traffic, they would require additional equipment. And they still
faced the problem of, if any of that equipment went out, the application
would be crippled until they could get it repaired.

One attractive thing about that was, they already knew how to run these
machines, so they probably wouldn't have to develop a lot of additional
skills. Which is good, because they don't really have a very large IT
organization, and they're stretched to the max, and posing a solution to
them that requires a significant learning curve is a challenge, because
they're always busy fighting fires.

We looked at adding virtualization, which would provide a software layer in
between the application and the hardware, such that if some hardware
failed, the application itself could be migrated to a different piece of
hardware. So, adding redundant hardware, you could migrate the virtual
machines, which would be very attractive. It would protect you from
hardware failure. However, if you grew, you'd still require additional
equipment, and you'd still have single-point failure. You'd probably move
to shared storage, and you'd have a problem with that shared storage being,
then, the bottleneck of a single point of failure. If something happened on
that shared storage, even though the application was contained in virtual
machines, if those virtual machines couldn't get at the shared storage, the
application was down.

And, really, to run a virtualized environment is not a slam-dunk, there's
new skills that people have to develop and so, they were reluctant to go
this route because they were going to have to make an investment that was
pretty challenging to find the time and money for, and also it still left
them vulnerable to hardware failure.

We looked at cloud computing, and one attractive thing is, it removes
hardware from the equation. Amazon takes care of that. Your application
runs in a virtual machine in an EC2 instance. You don't have to worry about
the hardware. Amazon figures out how to migrate the stuff around, and
there's always hardware available. So even if a particular machine within
the infrastructure of Amazon breaks, you can always relocate your
application onto a new piece of hardware. So, you're insulated from
hardware failure.

Any kinds of growth can be accommodated by additional EC2 instances. You
can just scale your application out. And that's very attractive, because it
doesn't require a significant capital investment, nor does it require you
obtaining that, provisioning it, installing it, racking it, stacking it,
and so forth. So that's nice.

And actually, there's a fairly modest skill development required. It's not
nearly as big a leap to learn how to use Amazon as it is to learn how to
run a virtualization environment. So that was attractive as well.

The question still remained: what would the numbers look like, though?
because if the choice for them was, "Oh, you'll get better resiliency, but
it's going to cost you three times as much," they probably would look at
that and say, "Well, we'd love to go there, but we can't. We're going to
stick with our old solution, and just face the risk of hardware failure."

So we ran a number of scenarios, and I would like to go through those
examples. Turning to the first one, which is on Slide 15. You can see that
this is the simplest case. This was an Apache web server, Drupal, MySQL, so
all open source, which is attractive from a software licensing perspective.
And as originally delivered, it was running on a single server in their
colo. And so this was kind of the apples-to-apples direct comparison.

And looking at the hosted cost, it was relatively inexpensive. The machine
was already purchased, so they only had to pay the hosting fee, which is
$200 a month. Not that expensive, but again, still left them vulnerable to
hardware failure.

We looked at the Amazon costs, and it turned out that it was going to run
about $90 per month, based on the numbers that we had at that point and
you can see that the total savings for this Option 1, the simplest case,
ran about 55 percent, which is pretty darn attractive. Being a non-profit,
every penny counts, and this is very positive for them. Plus, again, this
addressed their main concern, which was their hardware failure

Turning to the next slide, Slide 16, this was the case of, how about if
load increased enough such that we needed to increase the number of
servers? So, partition the application, put the database on a separate
server so that it doesn't contend for resources with the web server, put
the rest of the application on a separate EC2 instance, and then run that,
and that gives you additional performance capabilities. And again,
remember, one of the attractive things about Amazon was that this is very
easily accomplished. Resources are available, no upfront investment, and
really no delay between the choice to go to this and the ability to do it,
unlike having to get a physical server installed and so forth.

In this case, if they were to do it on the hosting environment, they would
have to not only pay twice as much for the hosting fee, but they would
actually have to purchase some hardware. And so, you see that we've got an
advertised monthly cost of hardware in there. Total monthly cost came to
$455 if they wanted to add a second machine into the host environment.

Looking at Amazon, of course they're going to be paying for two EC2
instances. There's going to be some storage associated with each of those
instances, data transfer for each of those instances, and so forth, but the
numbers came out to $192 for the total monthly cost. So on the Amazon
savings, it ran about 57 percent. So, pretty consistent with the first
case, and this gave them more headroom.

And in fact, this is something that they did when that Olympics opportunity
of the lesson plans for NBC came along. They actually split their
application process across two servers to ensure that they had sufficient
headroom on the performance, and they could do that with the certainty that
they would know what their numbers would run out to, and it would not be
too expensive.

We also wanted to understand, what if this application really takes off?
What if this becomes a super heavily used application, and teachers from
outside Silicon Valley begin using it? Because there's nothing that limits
this to only Silicon Valley-based teachers. So it could potentially be a
very large user base coming in at unpredictable times of the day. So we
looked at, what about if we horizontally scaled this and ended up running
three application instances against an individual database system?

This would be five server instances. And we came to, that's going to be
$1000 for the hosting. There's going to be additional hardware. Advertised
costs are about $1222 for the hosting environment. On the Amazon side, we
came to analysis that said the monthly cost would be $774, for a total
savings of 36 percent. And I'll talk about this a little further in the
next slide, because this doesn't look as attractive, and you might
consider, why wouldn't they think about that, or does that have some

We went back and refigured the numbers, and concluded that we could
actually run that load balancer software on the web server instances, so we
wouldn't need a separate instance to do the load balancing. And we ran that
after we'd done this analysis, and concluded that by dropping one of those
EC2 instances, that dropped the cost of when we're horizontally scaled,
came in at 55 percent savings as well. So, very consistent, and really,
that was because we figured out a better way to architect the applications
such that we did not need a load balancer in front of these instances.

So we've looked at this, and we've understood this, but you might say, this
is really, again, still looking at kind of static stuff. Because what if
you're going to run the application on a single server on Amazon for some
period of time, and then you're going to horizontally scale it for, let's
say one week a month is the week that lesson plans get done. How can you
account for that? In other words, looking at Slide 19, how do you address
load variation? Remember, that was one of the first things that I talked
about. Standard deviation of loads is increasing in this new world. How do
you address that?

So really, the first thing to understand is it's a dynamic world. App loads
are dynamic, too. This is just a fact of life, and it's going to become
more and more prominent. How do you account for variation? How do you
account for cost variation? Again, looking at this.

And a good way to approach this is using a technique out of statistics
called Monte Carlo simulation. And essentially, Monte Carlo simulation is
running numerous analyses of things, changing the input assumptions. And so
what you can do is actually say, "Well, I'm going to run three instances 60
percent of the time, but 40 percent of the time, I might go to five." And
you plug those assumptions in and generate out numbers. And you can
actually run multiple simulations to understand potential TCO variation. So
you might say, "Well, it's not 60 percent and 40 percent variation. It's
really more like 10 and 90." And you can run that.

And the attractive thing is, as we look at that spreadsheet, there are
actually plug ins available for Excel that are Monte Carlo simulating
support plug ins. So you plug those in, and you can put inputs in there, and
it'll run numbers out and give you analyses based on these Monte Carlo
simulations. And that's what we recommend, that if you want to get to
understand the cost of variation or load variability, that you run Monte
Carlo, so that it gives you a range of costs.

And let me say that, again, within the context of cloud computing, given
those five characteristics, it's relatively simpler to do that. It's quite
challenging in an internal environment, where it's hard to figure out what
the costs are, generally, and then trying to figure out if there is even
any way to address an environment where the load is so unpredictable, or
that it has predictable variability, but significant variability. So at
least with cloud computing, you can figure that out pretty

Look next at Slide 20. This is kind of, what do you need to keep in mind?
What TCO factors to consider. Well, horizontal versus vertical scaling.
Vertical scaling refers to using bigger machines. Bigger, in the case of
Amazon, bigger instances. In the case of internal systems, it would be
using a four-socket server instead of a two-socket. Consider those things,
because a common design pattern is to use smaller instances, smaller
machines, but horizontally scale. Use three web server machines instead of
one big web server machine.

Understand the temporal patterns of use load. How, across time, does the
use load for your application vary or is it likely to vary? Does it get a
lot of load once a year, during the December holidays? Or does it, once a
month, do you have a huge use to run month-end reports? Something like
that. Try and make some assessment of the mix of computer storage and
traffic, because each of those is a different cost basis. Each of them is a
different cost input.

Don't ignore variability. You can't afford to ignore variability. That is
the future, going forward. So we talked about the technique of using the
Monte Carlo simulation.

Account for the application life cycle. One of the things to keep in mind
is, very few applications just have a single version running. You're
probably going to have a development environment, a test environment, a
staging environment, and a depression environment. So you might have four
different environments, each of which is going to require resources. That
would be the case if you were doing it internally. It'll also be the case
if you're doing it with a cloud provider. So keep that in mind.

And don't underestimate change over time. Applications are notorious for
being rolled out with the assumption of, "Oh, ten people will be using
this, so we only need this size machine." And then you find that those ten
people start talking to their counterparts in a different department, or in
a different company, and all of a sudden that application that was scoped
and sized and the financials were calculated with a certain size user
population and load is all of a sudden three times as much, or ten times as
much, or a hundred times as much. So don't underestimate change over time.
What that would imply is, as you're doing your Monte Carlo simulation,
maybe you need to plug in some really big numbers to understand the

So, in conclusion, looking at Slide 21, what should you take away from this
analysis? What should you take away from this approach to cloud TCO? Well,
one, despite what that spreadsheet said, don't look for penny accuracy.
Trying to come out to that level of accuracy, I think, isn't really the
right approach. The right approach is to get a sense of kind of what the
overall cost is likely to be. Our recommendation is, unless you're
convinced that you're going to have at least a 20 percent savings going
with a cloud environment, it's not worth it, because there will be change.
There will be process challenges. There's overhead to using an external
cloud provider, and so forth and so on. Unless there's a big win, it's
probably not worth it. But once you get above 20 percent, and certainly
when you get to the kind of numbers that the Silicon Valley Education
Foundation had, when you're saving half of the cost of running an
application, it makes a ton of sense to be considering running it in a
cloud environment.

Include all the factors. Sometimes people do a rather unsophisticated TCO
analysis and just count the cost of an internal server, and stack that up
against all these costs on the cloud provider, and go, "See, it's much
cheaper to do it internally." And really, that just indicates that it's not
a complex enough or sophisticated enough analysis. You have to account for
all those indirect costs and those advertised costs, as well, to really
understand what the cost of the internal alternative is going to be.

Understand the challenges. It's not necessarily easy to understand all
those characteristics of the indirect costs and so forth, so it's important
to understand those upfront. And if you would like a TCO factors checklist
-- in other words, a paper that you can work from that'll account for all
these factors here -- we have that available at our website. You're welcome
to download it. Just go to this URL and download the document, and you'll
have a document that you can work from as you do these kinds of
work throughs and analyses.

So I hope that this has been helpful for you in terms of understanding the
TCO of cloud computing. And, again, if you want to take a look at a
document that will help you walk through that, you're more than welcome to.

And thank you so much. I really appreciate the time to have been speaking
to you about this very important topic of cloud computing TCO.

View All Videos