BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
When one of the world's largest publishing houses unleashes 70,000 new digital titles annually compared with just 15,000 in print, it's clear that digital transformation and cloud computing have won, even in the most ancient of industries. For Penguin Random House, that transformation means keeping digital interactive apps and services available and performing at peak levels. Doing so has become a major thrust of the company's application development efforts.
"We have a lot of applications. We have web server tiers and back-end [Red Hat] JBoss [Middleware] tiers, as well as APIs," said Brian Uckert, a systems design engineer at New York-based Penguin Random House (PRH) who specializes in DevOps and continuous deployment. "It's important to be able to see if we're having a spike that's affecting the database and where that's coming from." A spike of that nature could be an elevated number of I/O calls, unusually long response times or even a combination of the former causing the latter.
To monitor its applications during development and test, and after deployment to production, PRH is using New Relic application performance monitoring (APM) tools.
APM tools help provide deeper insight into the software that developers are building and how that software performs when it's under load, said Greg Unrein, vice president of New Relic, an APM toolmaker based in San Francisco.
APM tools, which have existed for decades to assist in the optimization of on-premises mainframe applications, are changing to keep pace with the cloud computing revolution. In December 2016, research firm Gartner broadened its definition of APM to encompass three distinct areas.
Digital experience monitoring helps developers optimize the interactions of people, machines and digital agents. Application discovery, tracing and diagnostics map transactions across multiple servers with a goal of finding and remediating problems. The third component, application analytics, uses machine learning technology and statistics to help detect transaction anomalies on servers running Java or .NET. Garter predicted, by 2020, 70% of those purchasing APM technology will reside outside the bounds of traditional IT operations -- a significant increase from 40% in 2016.
With a portfolio of mobile and browser-based cloud apps for bookselling, science-fiction exploration, cooking and child rearing, along with a membership-based personalized recommendation engine for suggesting book titles, PRH has dozens of services operating on its hybrid on-premises and public cloud infrastructure. All of those apps and services interact with PRH's global database.
Anomaly hunting with APM tools
As entire IT application systems fragment into hundreds of containers, microservices and cloud services from multiple providers, the challenge of locating a bottleneck grows increasingly daunting. In Uckert's example, suboptimal database behavior is likely a symptom of a problem occurring elsewhere, and it's not necessarily the result of a problem within the database.
It can be the manifestation of issues that occur anywhere throughout the entire system chain -- a database API call in a mobile device app that asks for more data than needed, network latency or inefficient program code. Of course, it is possible that a slowdown might be due to the database itself, such as a corrupted index or a delay due to switching to a backup database located in a distant data center.
Using New Relic APM tools, Uckert's team works to identify problems before apps are deployed to production. "If we find it in test, that's great. We can make modifications in dev, and from dev, we can push it to test," Uckert said. "We know that, in dev, there are going to be anomalies, because it's very unstable; it's constantly being iterated."
After an app is deemed sufficiently stable in testing, it then moves to staging and, ultimately, to production, where "it should maintain its stability," Uckert said.
RPH is not alone in using APM tools during the development phase to find problems, rather than waiting until the testing phase. As the release frequency of cloud and mobile application updates continues to climb in response to competitive pressures, APM becomes a crucial tool in the app development and management lifecycle. The market for APM software is flourishing as a result. According to a forecast from MarketsAndMarkets, annual global APM software revenue of $2.7 billion in 2014 is expected to reach $4.98 billion by 2019 -- a compound annual growth rate of 12.86%.
Getting developers to see anomalies faster, or before an application reaches production, is a nearly universal challenge, said Stephen Elliot, program vice president for DevOps, multi-cloud management and IT service management at IDC.
API health check
A challenge for application developers and operations staffers is a modern mobile and cloud computing experience, though it appears as a single entity to the end user, actually consists of dozens of islands -- services, containers and data sources from multiple providers running on separate clouds -- all connected through a combination of homegrown and third-party APIs.
Not surprisingly, with systems built up from dozens of services, monitoring the performance of the APIs that connect those services through an APM tool is vital. "API management and monitoring are a big piece to this puzzle. They are the glue for the vast array of components to drive integration of bidirectional data streams," Elliot said. "They can't be ignored; they need to be monitored, managed and secured."
Brian Uckertsystems design engineer at Penguin Random House
Uckert agreed that monitoring APIs is essential for achieving optimal performance. "You don't realize [how] your API calls [are] being affected unless you are actually monitoring them," he said. That database slowdown may be caused by a poorly crafted API call or a third-party external call that is sitting idle, waiting for data to be returned after making a call.
Though not able to provide details about PRH's hybrid cloud infrastructure due to security concerns, Uckert said, as in any other IT operation, it's important for developers to see problems or inefficiencies early and make corrections. "That's why iterating over and over again is really important," he said.
To manage the process, PRH uses the GitLab repository management platform for source code management and release automation. Once developers commit their code, it is deployed automatically to dev test or production servers.
"We know there are anomalies," Uckert said. That could be a subscription event that brings in a lot of traffic, or it could be abnormal activity, such as a DDoS [distributed denial-of-service] attack. What is especially important, he said, is to see how APIs react when a query is taking an unusually long time to complete.
A boon for developers
Despite the vast assortment of app development tools available, it might come as something of a surprise to learn that not all developers are familiar with APM tools. "One of the most fun things I get to do is see people who haven't experienced this kind of monitoring before try it for the first time," Unrein said. "If somebody hasn't seen it before, which is getting rare, they're pretty amazed at how much visibility it gives them."
While developers will appreciate the ability to find and fix problems quickly, speeding the process of rooting out problems has a direct bearing on the economics of IT systems, according to Unrein. Having happier, less-stressed-out developers and operations people leads to faster cycle times and better outcomes, he said.
As for developers at PRH, using APM tools has improved the process of building applications, according to Uckert. "It's what they've been looking for. Our job is to be enablers and not blockers," he said. "It helps pinpoint problems before they really become critical."