CAMBRIDGE, Mass. -- Cloud monitoring and cloud automation are critical in some large-scale environments, but they're not something everyone looks to Amazon for.
"Homebuilt tools are the ones you can't live without," said Craig Tracey, DevOps lead for marketing software startup HubSpot Inc. in Cambridge, Mass.
HubSpot is one of those IT shops that "monitors everything," Tracey said. It runs 1,400 AWS instances, as well as deployments in Rackspace Hosting Inc., to perform big data analytics and host more than two dozen applications.
"We monitor things that a lot of people overlook. … For example, I'll get an alert if someone sets up an instance that's not striped across [availability zones]," Tracey said.
Meanwhile, IT pros that rely on pre-built such as Amazon's CloudWatch complained of quirks.
Brian Tarbox, a software engineer at Cabot Research, a financial data analysis firm based in Boston with 1,000 instances on AWS, said he tried to create a metric in CloudWatch to monitor the size of the work queue in his environment, but CloudWatch returned a message saying there were too many metrics in use.
"Then it comes back saying I have all these other basic metrics that I don't care about for each of my 1,000 instances and I can't find my metric," Tarbox said. "I've got a bug report I sent in to them where they sent back a direct link to the graph of my metric in the same window where it said the metric didn't exist."
Others agreed that CloudWatch has its share of issues.
"CloudWatch is very superficial and it's been very challenging to work with," said Joey Imbasciano, cloud platform engineer for Boston-based Stackdriver. "It's something you either put up with or you try to build your own internal metrics, maybe around some open source stuff like Graphite, StatsD, or another hosted service."
Chef, Puppet and cloud automation performance
With cloud automation, a key feature of the panelists' architectures, configuration automation tools such as Chef and Puppet were praised, but attendees acknowledged these tools only go so far when it comes to speedy provisioning on AWS.
Every month HubSpot spins between 200 and 300 instances up or down, using a very "vanilla"
golden operating system image. Using Puppet can take 10 or 15 minutes to spin up instances in his environment, Tracey said.
However, panelists said there are approaches to speed up cloud automation.
For more on AWS
What are the biggest challenges AWS customers face? Click here to find out.
It can take 20 to 25 minutes with Puppet to spin up an instance, according to Barry Jaspan, senior architect for Acquia Inc., an open source software company providing support for Drupal based in Woburn, Mass. The company might spin up as many as 100 instances per day.
To take this provisioning time down to four minutes per instance, Acquia snapshots a base OS image "bundled" with various utilities each day, then uses that bundle as the basis for launching instances the next day.
Jaspan stressed not to create various bundles with different configurations because "there's no way you're ever going to remember what you did, so when you need to start over or change something, you've got this evolved image you can't really modify."
Opscode, makers of Chef, said it can take a long time to configure an instance with any tool if there's a substantial amount of software to install.
Puppet Labs declined to comment. Amazon did not respond to requests for comment.