Q
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Does Hadoop in the cloud give big data a boost?

Hadoop is an option for handling the vast and diverse data sets that come with big data. But how exactly can it be used in an organization?

Why is Hadoop in the cloud important for big data analytics?

To be meaningful, data must be processed, cleaned, filtered, transformed and modeled. When data sets are small or limited in scope, organizations can analyze and model data on a single server. Enterprises can also use conventional data processing techniques, including those with software development platforms such as the R programming language.

However, when data sets grow, processing tasks must be distributed among multiple servers. To coordinate distributed storage and distributed processing tasks, it's also necessary to have an additional software layer. Tools like Apache Hadoop provide this additional layer as an open source software framework. Hadoop supports distributed data storage and processing for big data sets across computer clusters. It has emerged as the standard framework for big data analytics projects, and version 2.6.0 was released in November.

Hadoop is not a single tool, but a series of related modules. These include Hadoop Common with the core libraries and utilities; the Hadoop Distributed File System (HDFS) for storing data on distributed machines; Hadoop YARN for managing and scheduling cluster computing resources; and Hadoop MapReduce, the programming language for big data processing. Enterprises can deploy Hadoop in an on-premises data center, or through cloud providers such as Microsoft Azure, Google App Engine, and Amazon S3.

Hadoop, while the most common, is not the only big data storage and processing platform. There are a number of alternatives, including Hydra, Zillabyte, Stratosphere and Sparc. These alternatives are still emerging, so it's important to test them each before deployment. To do this, evaluate the potential benefits of each platform. For example, the Zillabyte platform caters to Ruby and Python, so it might be a solid option for big data developers who are proficient in those languages. Once enterprises select a platform, arrange a test deployment using a limited cloud environment that duplicates an established Hadoop environment as closely as possible.

About the author:
Stephen J. Bigelow is the senior technology editor of the Data Center and Virtualization Media Group. He can be reached at sbigelow@techtarget.com.

Next Steps

Getting to know Hadoop cloud tools

Hadoop as a service competition heats up

Following customer trends with cloud predictive analytics

This was last published in February 2015

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

An enterprise guide to big data in cloud computing

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

In my opinion, Apache Hadoop supports big data projects because it provides distributed data processing and storage for large data sets across multiple computer clusters.
Cancel

-ADS BY GOOGLE

SearchServerVirtualization

SearchVMware

SearchVirtualDesktop

SearchAWS

SearchDataCenter

SearchWindowsServer

SearchCRM

Close