Melpomene - Fotolia

Problem solve Get help with specific problems with your technologies, process and projects.

Improve hybrid cloud monitoring through automation, alerts

To effectively monitor hybrid cloud infrastructure -- without being overloaded with data and alerts -- IT teams need to rethink some of their existing processes. Use these five best practices to get started.

As applications and data increasingly span private and public clouds, enterprises face a host of new challenges -- and monitoring is one of them.

IT teams need to build a robust hybrid cloud monitoring strategy that ensures solid application performance, high availability and low costs across different infrastructures. In addition, they require a monitoring tool that can aggregate data about different infrastructure components, such as compute, storage and network. Automation should play a key role in any hybrid cloud monitoring strategy, as it improves data collection, reduces false negatives and positives and dynamically scales infrastructures.

Here's a look at some tools and best practices to monitor a hybrid cloud deployment.

Standardize back-end monitoring

Every public cloud platform and private tool generates different kinds of data. Application performance monitoring, logging and tracing tools can complement this data. This can lead enterprises to store multiple data sets in separate monitoring tools, which makes it difficult to make effective decisions, identify problems and automatically scale cloud infrastructure.

In addition, different roles in an organization have an interest in different kinds of information. Application developers, for example, are more interested in debugging code, while operations teams will want to know how to respond to incidents. To address this, some organizations create separate applications for these tasks that integrate directly with the various monitoring tools -- but this only increases complexity.

A good practice is to aggregate a useful subset of monitoring data from the various cloud platforms in use into a single monitoring tier. This enables you to present the most appropriate alerts or reporting data to the right people, with the most appropriate tool and without new integrations.

Identify important KPIs

Enterprises need to identify and define key performance indicators (KPIs) to measure success. But there is also a risk of KPI fatigue -- where users are presented with so many different KPIs that each one loses its importance.

A hybrid cloud monitoring infrastructure generates an enormous amount of useful data, so enterprises need to focus not only on which metrics they should monitor, but how those metrics relate to specific organizational or departmental goals. These could be page load times, cloud cost optimization or conversation rates.

Evaluate where your company encounters the most challenges, and then focus on those KPIs. Then, continuously revaluate KPIs to determine if others should take precedence.

key steps in the KPI process

Establish a useful alert threshold

Hybrid cloud monitoring can also generate a large number of alerts. If the alert threshold is set too high and results in many false positives, engineers can experience alert fatigue and overlook pressing problems. However, if the alert threshold is too low and creates false negatives, engineers will not receive notifications in time to act on an important issue.

To avoid this, automate the scoring and delivery of alerts in a timely way. For example, some hybrid cloud monitoring tools now use AI to identify and tune alert thresholds. The combination of AI and automation can correlate multiple alerts together and highlight the issue that engineers need to address.

Connect monitoring and management

As IT teams advance their hybrid cloud monitoring practices, they may start to address the same problems repeatedly. It is time-consuming to identify the root cause of a problem and then fix it, especially when you do it over and over again.

Integrate your monitoring system with communication management tools, such as Slack or PagerDuty. This enables you to automatically capture data about how a team responds to a particular type of problem. If the same issue resurfaces, look at prior communications about the alert, and quickly apply the same fix.

Look for ways to integrate your monitoring tool directly with your cloud management tool. This will enable alerts to automatically kick off tasks, like service restarts, resource scaling or rollbacks of a new deployment.

Explore vendor tool options

Many enterprises turn to their primary public cloud provider's native monitoring tools. AWS, Microsoft and Google all offer them, though not all have native support for hybrid infrastructures.

Amazon CloudWatch monitors applications, resources and CPU usage on the AWS platform. While it does not directly support hybrid monitoring, there are third-party tools -- such as those from BMC, CA, Datadog, Dynatrace, New Relic and Stackify -- that can aggregate CloudWatch data into their alerts and reports.

Azure Monitor provides an end-to- end view of all private and public cloud resources that run on Windows and Linux servers, as well as VMs. The tool captures, analyzes and produces alerts. It is best for companies that require unified monitoring between private infrastructure and Azure. Azure Monitor also supports tight integration into Microsoft Operations Management Suite Automation to dynamically scale private and public cloud infrastructure.

Google Stackdriver offers monitoring, logging and metadata for private infrastructure, as well as services that run on Google and AWS clouds, for availability and performance optimization. It integrates with Google analytics tools, like BigQuery and Cloud Datalab. Additionally, Stackdriver can automate alerts sent to apps, like PagerDuty and Slack.

This was last published in September 2018

Dig Deeper on Building a hybrid cloud

Join the conversation

2 comments

Send me notifications when other members comment.

Please create a username to comment.

Which hybrid cloud monitoring tools do you use, and why?
Cancel
Turbonomic, best platform available. 
Cancel

-ADS BY GOOGLE

SearchServerVirtualization

SearchVMware

SearchVirtualDesktop

SearchAWS

SearchDataCenter

SearchWindowsServer

SearchCRM

Close