AWS analytics tools help make sense of big data
A comprehensive collection of articles, videos and more, hand-picked by our editors
For years, IT organizations have generated, collected and stored vast amounts of data. Now, IT is being asked not just to store the data but to provide the infrastructure to perform analytics on it. The trouble is that the task is a resource-intensive proposition. For organizations that don't have idle servers to throw at big data workloads, could tapping into the public cloud be a valid alternative to maintaining costly internal infrastructure?
When it comes to processing big data, the public cloud has a lot going for it. Public clouds are pay-per-use, so they are a good fit with finite big data workloads. Further, many big data analytics jobs can be easily "parallelized," or chopped up into smaller, more discrete tasks, which maps nicely to public cloud. And many public cloud providers offer templates for popular big data platforms such as Hadoop, making it easier for administrators to set up the requisite infrastructure.
For some big data analytics workloads, going to the cloud is the only valid scalable solution. Medio, a real-time analytics firm, rearchitected its software for multi-tenancy, tapping into the scale of Amazon Web Services (AWS) to complement its data center.
"One billion data events hit our data borg every day; we have 15 million monthly unique visitors," said Rob Lilleness, Medio's chief executive officer. In Hadoop running on AWS, Medio encountered a platform that could handle the scale that the company sought.
Even for modest workloads, public cloud's pay-per-use model is appealing. "The cloud is good at spin-up and spin-down," said Frances Guida, Hewlett-Packard's manager for cloud in its enterprise group. That's a nice fit with big data. "A lot of analytics aren't predictable, and when you answer the question, you don't necessarily need the infrastructure again," she said.
Nor does big data in the cloud need to be an all-or-nothing proposition; some organizations see value in taking a hybrid approach. Archimedes Inc., a medical simulation firm in San Francisco, manages a private Hadoop cluster for data processing with the help of Univa Grid Engine software, but runs its front end on the AWS cloud. "We could have run [Hadoop] on AWS as well, but when we calculated the cost, we figured out that if we could keep the hardware busy 30% to 40% of the time, it was cheaper to run it in-house," said Katrina Montinola, vice president of engineering.
But while using the cloud for big data problems may seem like an obvious answer, there are a lot of caveats: security, for one, but also physical constraints concerning data movement and latency. Even more daunting is the lack of trained professionals who know how to pose business questions of the data in a meaningful way. While the latter problems can be addressed with time, money and technology, data science skills are harder to come by -- and are certainly not the domain of your average IT administrator.
Analytics as a Service creates hunches, bridges IT gaps
Arguably, cloud's biggest contribution to solving the big data problem is the number of analytics vendors that have adopted the Software as a Service (SaaS) model. IT departments not only don't need to buy infrastructure but also don't need to set anything up.
That's been a powerful selling point for Emcien Corp., which offers its pattern detection software as a service running on Amazon Elastic Compute Cloud (EC2) and counts large retailers, telecommunications providers and intelligence agencies among its customers. "The IT user doesn't need to buy all that hardware. All you need is a Web browser and you're in business," said Radhika Subramanian, Emcien CEO.
Big data platforms are notoriously complex, said Ryan Sousa, senior vice president of engineering at Medio. "To get to the scale and cost-effectiveness from analytics, you need Hadoop and Cassandra and other foundational components to get to the necessary size, throughput and performance," he said. "Building out that framework can be really challenging, and it's rarely in-house IT's core business."
Even longtime business analytics users are considering a possible switch from on-premises to cloud. Janet Grimsley, vice president and information specialist at The Fauquier Bank in Warrenton, Va., uses on-premises information optimization and visualization tools from Datawatch Corp., which recently began offering its tools as a service. "We would entertain using it as a service," she said. "It's just a cost issue for us."
As such, the emergence of the SaaS model could close a longstanding rift between business users and IT, said Rod Smith, vice president of emerging Internet technologies at IBM, which now offers a SaaS version of its Social Media Analytics tool. Big data analytics allows line-of-business users to look for insight and follow hunches, but historically, "IT can't plan for a hunch," Smith said.
Now, with SaaS-based analytics tools, "line-of-business [users can say] 'I can move as quickly as I'm willing to spend money.'" IT, in turn, can "help line-of-business follow its hunches."
Sometimes big data can mean big problems
But SaaS-based analytics can take you only so far. Emcien, for instance, used to run its software exclusively in the cloud, but recently, has started to offer a virtual appliance version of its software that customers can run in-house.
"The cloud is a fantastic resource for us because of its scalability and because it is extremely economical," said Subramanian. However, as data sets grow larger, some customers have balked at the prospect of moving those sets to and from the cloud. The company routinely does evaluations on sample data sets of 60 TB, and production data sets soar into the hundreds of terabytes or even petabytes, she said.
Even if the data sets aren't that large, convincing organizations to put them in a cloud can be tough, conceded Medio's Lilleness. The first two generations of Medio's product ran on-premises because "people wanted their customer data inside their own data center -- they had a greater degree of comfort with that." And when Medio relaunched in the cloud, "people were really resistant to moving their data there."
But even traditional on-premises technologies are being retooled for the cloud. HP's CloudSystem converged infrastructure platform, for example, features the same cloud management layer as the company's public cloud offerings, making it simpler for organizations to "burst" to the cloud to meet peak demand, said Margaret Dawson, HP vice president of product marketing and cloud evangelist.
Even on the security front, resistance is softening, said Lilleness, and traditional offerings from business intelligence and data warehousing vendors like IBM Netezza, Oracle and Teradata are seeing declining adoption, he claimed.
"It's too expensive. Customers can't afford to have enough capacity to analyze the explosion of data that we're generating," he said. "The cloud is winning out."
About the Author:
Alex Barrett is editor in chief of Modern Infrastructure. Write to her at firstname.lastname@example.org.