Greg Arnette, CTO and founder of Sonian, designed and built his Software as a Service (SaaS) application to live in the cloud from square one. Sonian provides messaging and email archiving services for enterprise
customers. In the first part of this interview, Arnette discusses the fundamental changes in software application stacks designed for the cloud versus on-premise enterprise applications. Check out the second part of our interview, where Arnette discusses the economics of running an application in the cloud.
What is the major design difference for programming for the cloud versus traditional, on-premise enterprise applications?
Greg Arnette: This is not software that is written for an on-premise kind of deployments. This is a brand new kind of software stack that, from day one, was designed to live on cloud computing. What that means is that the algorithms we use to control CPU and storage are all based on the goal of seeking the best value or best return on investment for our infrastructure expense.
In the cloud, you think about CPU hours, you think about storage by the gigabyte, of bandwidth by the gigabyte. We're incentivized around processing the most data against the fewest CPU hours, to really take advantage of what cloud computing offers with that scale up, scale down approach to data processing.
What are some of the architectural challenges in designing for a cloud?
GA: I would say it is more challenging than doing it for a typical on-premise deployment where your software might live on one or two dedicated servers in an IT room. The challenges for the cloud (and they're both the benefits and the areas you have to program around) are around the nature of CPU. It's ephemeral: it's there, it's not there, it's [delivered on] application programming interface (API) calls.
We use the concept of an enterprise service bus (ESB). An ESB allows our system to have this comprehensive control of all the different activities that need to be accomplished in a certain amount of time. An activity is collecting data from a customer, or indexing data or delivering search results to the customer logged into our Web user interface (UI). All those activities are translated into jobs that are then processed by an available CPU.
So we have this constantly in-flux footprint of CPUs that are running and available to service the jobs that are running through the system at any given time.
So the customer isn't directly controlling or influencing capacity. You frontload where you expect use to be and spin up servers as that footprint (of available servers) fills up?
GA: Right. It's no different than me going to Google.com and doing a search. We don't know how many CPUs had to be involved to deliver your search result back, that's Google maintaining the infrastructure. We're actually very much the same way using cloud-scale concepts like MapReduce to do a search across a very large dataset and deliver the results back to the UI in sub-second response times.
That's why it is so different from writing software that is only going to live on a couple of CPUs.
Cloud brings this whole new way of thinking about architecture and design, since we don't control the infrastructure directly, so we're one step removed. It's meant from day one to scale horizontally, be very cost effective to operate because you can scale up and scale down and control your CPU expense very granularly.
Then, always program for failure. Instead of expecting everything to be fine, like you do with enterprise software running on a couple of Dell servers in a server room, in the cloud, you always have to anticipate something might go wrong. Not that the cloud's unreliable, but you just don't have that control of hardware that you normally do in a regular software stack.
How do you deal with transactional cost? Doesn't the application become slower because you have more hoops to jump your application through to get it to the customer?
GA: No, not really. That's the positive aspect of deploying your software on a scale-up cloud environment. You have a menu of choices of the type of CPU you want. You can turn on a single-CPU Web server box -- that's the low end -- or you can turn on a 64 GB powerhouse box you might use for an Oracle database server. And that configuration and that CPU is literally available to you by making an API call, saying "I need a medium CPU" and within a minute it's up and running. Our enterprise service bus can say we're getting maxed out on indexing instances so it fires up some more instances to deal with the increased load and when those instance start to go idle it turns them off.
It's a baseline assumption that your resources are both dynamic and finite in scope, they're here, and they're gone. That's very different than assuming you're on a platform that never goes off unless the power goes off.
GA: It's forcing you to solve some of the thornier problems up front, but once you've gotten past those issues of failover and redundancy and horizontal scale then you're truly never going to be caught off guard by not having enough capacity available. Our CFO is happy as we're not overextending, we're not having extra CPUs sitting around idle not doing valuable customer work.
What about maintenance, patching and upgrades or changes to your application?
GA: In SaaS, especially powered by the cloud, your infrastructure is constantly in flux. You're constantly tuning and tweaking and replacing and modifing. You learn by trying something out, you get a sense of how things work and you can optimize. As long as your front end portal application never has any down time, the customers don't know about all the constant innovation going on behind the scenes.
That's no different than at Salesforce.com or Google or even Amazon! They are constantly making improvements behind the scenes. That's one of the positive aspects of being powered by the cloud. You can be in this constant state of improvement; different from enterprise software where you have to ship out CDs or download [patches and upgrades].
How does this impact sysadmins and developers?
GA: Our sysadmins and your developers work very closely, hand-in-hand. Often times, the developers are sysadmins (in a way that's positive). To develop something in a vacuum is just not possible with cloud compute. The environment, the constant state of flux, the fluidity affect how software runs and it's always a work in progress, in a positive way-you're always making positive improvements. Think of running a cable or television network. There are different components involved, the product is the content of the Web service, but there's a lot of plumbing going on to make sure there's no static on the screen.
It sounds like a meta-operating system. There's your front-end that customers see and all these little projects and servers in the background, like an operating system running services.
GA: Right. At some level, there's just one big CPU that's just the cloud that we live on and there's these sub-components that are constantly cycling in and out. We're making an assumption that the underlying cloud infrastructure that we live on is going to have the capacity there when we need it. It's sort of like how you can always count on the electricity being on at the outlet, so it's a dramatically different way of thinking about stuff in terms of software design and technology design.
GREG ARNETTE'S BIO:
Greg Arnette is the founder and CTO of Sonian. The trusted cloud-powered universal data management company, Sonian delivers archiving and storage, e-discovery and compliance services. Sonian's mission is to archive the world's electronic communications and files and make them universally accessible and useful via its powered universal data management service which is secure, reliable and affordable.
This was first published in February 2010