MapReduce

MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004.

The framework is divided into two parts:

  • Map, a function that parcels out work to different nodes in the distributed cluster.
  • Reduce, another function that collates the work and resolves the results into a single value.
  • The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.

    According to software engineer Mark C. Chu-Carroll:

    "The key to how MapReduce works is to take input as, conceptually, a list of records. The records are split among the different computers in the cluster by Map. The result of the Map computation is a list of key/value pairs. Reduce then takes each set of values that has the same key and combines them into a single value. So Map takes a set of data chunks and produces key/value pairs and Reduce merges things, so that instead of a set of key/value pair sets, you get one result. You can't tell whether the job was split into 100 pieces or 2 pieces...MapReduce isn't intended to replace relational databases: it's intended to provide a lightweight way of programming things so that they can run fast by running in parallel on a lot of machines."

    MapReduce is important because it allows ordinary developers to use MapReduce library routines to create parallel programs without having to worry about programming for intra-cluster communication, task monitoring or failure handling. It is useful for tasks such as data mining, log file analysis, financial analysis and scientific simulations. Several implementations of MapReduce are available in a variety of programming languages, including Java, C++, Python, Perl, Ruby, and C.

    See also: Hadoop, cluster computing, distributed computing, cloud computing

    Learn more:

    Eugene Ciurana asks the question, Why should you care about MapReduce?

    John Willis provides an overview of Amazon's Elastic Map Reduce.

    Rich Seeley explains why MapReduce moves from secret Google goo to enterprise architecture.

    Learn what MapReduce and in-database technology means for data warehouses.

    Hadoop has a MapReduce tutorial.

    Contributor: Mark C. Chu-Carroll

Contributor(s): Mark C. Chu-Carroll
This was last updated in February 2010
Posted by: Margaret Rouse
View the next item in this Essential Guide: unstructured data or view the full guide: Using big data and Hadoop 2: New version enables new applications

More News and Tutorials

Other Essential Guides Related to This Topic

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Research More Tech Terms

  • Search thousands of tech definitions
  • Browse tech definitions
    Browse Alphabetically:

Powered by WhatIs.com

File Extensions and File Formats

File Extension and File Formats List:

Powered by WhatIs.com