This content is part of the Essential Guide: Don't panic! The definitive guide to IT troubleshooting

With AI-powered cloud management platforms, context is king

While it's still early days for adoption, IT pros say AI-powered cloud management tools reduce a lot of the grunt work associated with performance and root-cause analysis.

Administrators who struggle to get deeper insight into cloud infrastructure and application performance have a...

new ally: artificial intelligence.

Some emerging and legacy IT vendors have infused AI technology into their cloud management platforms. While their feature sets -- such as the ability to analyze host performance, optimize costs and set up alerts -- sound similar to those found in more traditional third-party management tools, these AI-powered platforms reach a new level of sophistication, providing greater granularity and broader context, according to IT pros.

Travis Perkins PLC, a retail provider for the home improvement and construction markets based in the U.K., uses Dynatrace's AI-powered performance monitoring platform for its on-premises and Amazon Web Services (AWS) environments. Rather than focus on higher-level metrics related to host servers or instances, the tool reports more granularly on aspects like Java runtime code and errors, said Abdul Rahman Al-Tayib, e-commerce DevOps team leader at the company. This enables his team to perform faster and more precise root-cause analysis when something goes wrong, and better assess the overall impact any issues might have on the business.

"When it comes down to investigating or looking into specific elements of performance where we have had challenges, rather than having to do the investigation manually, [Dynatrace combines] it all into one report," Al-Tayib said. "So, it tells you, 'This service here failed to fire, and, therefore, it caused this series of events, which was then related back to [a disruption at your] customer.' You can immediately see where the challenge is."

To initiate this root-cause analysis, users install a Dynatrace agent on their host machine to identify the various dependencies between resources and help correlate certain events with any issues that arise, explained Alois Reitbauer, chief technology strategist at the company, based in Waltham, Mass.

"If you have a host that is running out of CPU, and the service running on that host has a response-time problem, [the tool can tell] these are related to each other," Reitbauer said.

More sophisticated anomaly detection, or identifying when an IT service is performing in an abnormal way, is another feature that makes AI-powered management tools stand out. To do this, the Dynatrace tool performs auto-baselining -- an automatic process that assesses baseline, or standard, system performance by applying different algorithms for metrics such as response time, failure rate and throughput.

After the tool extrapolates what normal performance looks like, it alerts IT teams to any deviations from that behavior. To avoid being bombarded with alerts, users can further specify performance thresholds, and the tool also applies algorithms to assess criticality.

"If I have two hosts that have infrastructure problems ... I obviously care more about the problem that might be with a checkout function for a cart in an e-commerce application than the other one that maybe does some background batch processing," Reitbauer said. "[That] user context, from an infrastructure case, is of main importance."

This ability for AI-powered cloud management tools to weed out noncritical alerts has been a boon to other users, as well. According to a network and infrastructure capacity planner at a cloud storage provider that uses AWS for its back-end infrastructure, that capability was one of the main reasons his company adopted an AI-powered cloud management tool called YotaScale.

[An AI-enabled cloud management tool] tells you, 'This service here failed to fire, and, therefore, it caused this series of events, which was then related back to [a disruption at your] customer.' You can immediately see where the challenge is.
Abdul Rahman Al-Tayibe-commerce DevOps team leader, Travis Perkins PLC

The capacity planner, who asked to remain anonymous, conducted evaluations on several third-party cloud management tools, but found that YotaScale allowed him to "suppress a lot of the noise" that can come with those tools' alerts and recommendations.

For example, a company might spin up some AWS instances for a new research and development project, and those instances tend to have low utilization as the project ramps up, he said. Third-party cloud management tools might recommend to right-size those instances or reserve them via an AWS Reserved Instance, but in this case, those suggestions are irrelevant.

"That's not how we would really do things in a bootstrapping scenario, where we are trying to bring up a new test or project, and so I'm going to ignore those," he said.

The benefit of the AI layer in tools such as YotaScale is to analyze IT infrastructure through the lens of various business departments or units, according to the Menlo Park, Calif., company's CEO, Asim Razzaq. In the example above, that's through the lens of a research and development team.

"We map that enterprise, organizational way of looking at things to the infrastructure," Razzaq said. "And then, within that context, deliver optimization [recommendations] and anomaly detection."

The YotaScale tool achieves this business context via user input. Users adjust certain parameters and dismiss recommendations that don't fit, teaching the tool to detect what's most relevant over time.

AI replacing humans? Not so fast

One overarching benefit of these AI-powered cloud management platforms is they reduce the need for humans to perform a lot of this analysis on their own. But even the most sophisticated tools won't provide the same level of insight -- at least not yet -- as an IT professional with 20 years of industry experience, said Chris Wilder, analyst at Moor Insights & Strategy.

"These algorithms will be smarter and smarter based on the anomalies they find, but they still don't have the experience a person would," Wilder said. "Data, in my opinion, is not a replacement for human expertise. It's just something to augment it."

These AI capabilities are still in their early phases, agreed Jay Lyman, analyst at 451 Research. But they will eventually become a must-have for infrastructure management tool vendors.

"We'll get to a point before too long where every provider is going to have to have some sort of machine learning and AI in their automation," Lyman said. "I think it will become pretty much a check-box item."

Dig Deeper on Public cloud and other cloud deployment models