By Carl Brooks, Technology Writer
In part one of our interview with Greg Arnette, CTO and founder of Sonian, he discussed how developing software application stacks designed for the cloud differs from developing on-premise enterprise applications. In part two, he discusses the economics of cloud computing application development.
Should developers plan on using the most capacity for an instance for any given job or do you spread it out into as many of the smaller instances as you can? The more expensive instances are orders of magnitude more powerful than the smallest ones, but can cost 10 times as much per hour.
Greg Arnette: We think about and study this lot; it's all around our operational efficiency requirements and our focus on that area. If you break down our entire system into sub-components, some components are CPU intensive, some processes are disk I/O intensive, and some are memory intensive. We have available to us different flavors of instances on-demand. Some are designed to deliver multi-core CPUs that are for CPU intensive processes but have very little disk I/O.
That might be a dataset that's already imported into memory, memcache stuff that's highly queried that lives on a high CPU, high RAM instance. We're able to tailor in a pretty accurate ratio the job that needs to run, the process that needs to run and the best CPU to bring up so we get the maximum efficiency.
We're incentivized by being as efficient as possible. There's a base footprint of CPU that always needs to be running that isn't fluid, because we always need a core element. In that case, we're able to purchase a year in advance of services and get a reduced cost by doing that as well.
Horizontal scale and distributed architecture allows us to discreetly segment different processes in our instance mix.
What do you prioritize in putting together instances to run each job type? Are they very highly specialized instances or general server OSes with your application on top?
GA: No, they're more specialized; we have a dozen Amazon Machine Images (AMIs) that are focused on certain processes, like a database AMI versus an indexing AMI versus a text-extraction AMI. That's what the enterprise service bus will fire up when one of these processes needs to scale out horizontally, along with Web server and application AMIs.
We have a requirement to operate a highly secure environment, so we're not using the publically available AMIs, which are great to get you started, but we start from core building blocks and only add just the necessary components for the specialized function of each AMI. As the AMIs boot up, they also self-configure and download the latest updates for each process and patches for each instance. That's part of our software distribution scheme on top the cloud.
How do you do management application consistently over a dozen types of servers that may or may not even be running at any given time?
GA: You get these complexities: how to update a live system, how do you update dormant AMIs that need to be accurate for the next time they're fired up?
The strategy is to push out a software deployment that doesn't break a currently running process. Part of the way we distribute the application across the different nodes is through a job on the enterprise service bus. The job itself contains the byte level code that's going to be executed. For example, let's go out on a scheduled basis and collect some data from a customer's network, over, say the IMAP protocol. The program that talks to IMAP is actually part of the jobs code that is coming from central source code.
A newer version can actually operate in parallel to an older version that could be currently running because it's a long job despite the update. So we have the ability to move something forwards in a 24 by 7 system; you never actually stop everything at once and do an update.
Sonian has been operating since the beginning of EC2's availability. Did you look at the platform and have a light bulb moment or was this something you'd been thinking about before and EC2 happened to be the platform of choice?
GA: We definitely had been thinking about the concept, and we started off in 2004 looking at trying to do something similar on what Sun was offering, the Sun Grid. But their pricing model made it uneconomical. It was a dollar per CPU hour and a dollar per gigabyte per month for storage.
By 2006, early 2007, with early betas of compute and storage from EC2 and S3 coming together, if there was a light bulb moment it was us saying "finally this is something we can work with". Fast forward to 2010, four years later, it's definitely acknowledged that there's comfort around the cloud (for the most part) in mid-sized companies.
What does it save you to run in EC2 in time and investment as opposed to having to rent a data center to do this?
GA: We haven't actually modeled, physically, what it would be like if we didn't use the cloud, because, probably, we wouldn't be doing this. It just wouldn't have made sense. If we didn't have access to the cloud we'd be doing something different.
Our business would be dramatically different in all aspects, from the capital required to get started to where we are in establishing customer traction. It's so "apples and oranges," it's difficult to say. We made an early bet that cloud computing environments (on day one, Amazon, and in the future, other clouds) could be harnessed for an enterprise audience. It seemed like a big bet in 2006, now it seems like a more secure bet.
|GREG ARNETTE'S BIO:|
|Greg Arnette is the founder and CTO of Sonian. The trusted cloud-powered universal data management company, Sonian delivers archiving and storage, e-discovery and compliance services. Sonian's mission is to archive the world's electronic communications and files and make them universally accessible and useful via its powered universal data management service which is secure, reliable and affordable.|