This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
5. - Cloud outages in the headlines creates focus on protection strategies: Read more in this section
- Cloud disaster recovery and backup options gain momentum
- Users find faults with Amazon's booming disaster recovery business
- Updating your cloud outage protection strategy
- Redundancy is key in cloud outage protection
- SaaS tool helps IT admins detect public cloud outages ahead of providers
- What to do when cloud computing services fail
- Amazon cloud outage underscores limits of automation
Explore other sections in this guide:
- 1. - Follow #reInvent on Twitter
- 2. - Where are enterprises in cloud computing adoption?
- 3. - The importance of locking down your cloud
One valuable SaaS tool has allowed IT pros to detect cloud outages before the vendors even notice.
An Azure user detected a service disruption 15 hours before Microsoft announced it, and an Amazon Elastic Cloud Compute (EC2) user was able to track issues Amazon support didn't otherwise notice using a Software as a Service-based application performance management tool launched in April.
Amazon is running such a big operation they don't necessarily notice things at the scale that we do.
Director, Okta's technical operations
The SaaS tool is an eponymous offering from Boundary Inc., which requires the installation of an agent on each server instance deployed in any public cloud that grants access to the operating system, such as Amazon EC2, Rackspace's Cloud Servers and Microsoft Windows Azure.
Once installed, the agent listens to network traffic running across virtual network interfaces and sends that information to Boundary's data centers, where it is processed and displayed for the user through a customizable Web portal.
"Before we installed Boundary, we had no idea how our solution behaved from a network perspective," said Fredrik Lindstrom, systems architect with QBranch, a European cloud service provider that uses the Windows Azure Service Bus to allow clients to access its data centers through the QNET self-service portal.
On October 30 this year, QBranch was able to alert its customers to a network problem with the Service Bus ahead of a Microsoft announcement about the outage.
While Boundary didn't identify the exact cause of the outage, it showed that many packets were being transmitted out of order or dropped between QBranch's location and Microsoft's virtual data centers in Europe. Ultimately, two faulty network switches were identified as the root cause for the intermittent network connectivity.
At first, it was assumed that the fault was on the QBranch side, Lindstrom said, but once the Boundary report was analyzed, it became clear that the company should open a ticket with Microsoft.
Okta shares Boundary information with Amazon support
Identity and access management service provider Okta is based entirely on Amazon Web Services (AWS) with some 200 instances deployed to run its test, development, staging and production workloads. It runs Boundary agents on some 70 of those instances, the ones used in production.
Okta has used the visibility into the network that Boundary provides to communicate with AWS support when it has a problem that's otherwise too small for the cloud computing behemoth to notice.
"Amazon is running such a big operation they don't necessarily notice things at the scale that we do," said Adam D'Amico, director of Okta's technical operations. "Something has to be a much bigger problem to move their needle than ours."
Specifically, Boundary allowed Okta to detect partitions in the network between availability zones, instances where no packets are crossing the network between hosts in different areas.
"They're much more likely to believe me if I've got that kind of monitoring where I can say, 'Look, I'm not making this up,'" said D'Amico. "Look, it's not just one or two machines; it's dozens in two different zones."
Both users said Boundary has the potential to become an even more effective tool. Automated thresholding, for example, was on both users' wish lists -- the ability to scale monitoring thresholds automatically along with the environment, as well as the ability to automatically warn users when traffic is in an abnormal state. Otherwise, it can take some time to learn what's normal and what's not, according to Lindstrom.
D'Amico said Boundary has the potential to monitor for security issues as well as performance on the network, alerting the user when a server talks over a port it hasn't used before, for example.
Boundary competes with a plethora of application performance monitoring tool providers, including AppDynamics, CA Inc., Circonus, ExtraHop, Librato and NewRelic. There are also other cloud infrastructure monitoring tools that use big data analytics delivered via SaaS to help users get a handle on cloud management, including CloudPhysics, Sumo Logic, Splunk, AppFirst and ScaleXtreme.
Boundary's tool is free up to 2 GB of monitoring data sent to Boundary's data centers. QBranch pays $400 a month for 5 GB of data.