MapReduce
Home > Cloud computing Definitions - MapReduce
SearchCloudComputing.com Definitions (Powered by WhatIs.com)
EMAIL THIS
LOOK UP TECH TERMS Powered by: WhatIs.com
Search listings for thousands of IT terms:
Browse tech terms alphabetically:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #

MapReduce


Show me everything on Cloud APIs and frameworks


Word of the Day


DEFINITION - MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004.

The framework is divided into two parts:

  • Map, a function that parcels out work to different nodes in the distributed cluster.
  • Reduce, another function that collates the work and resolves the results into a single value.
  • The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.

    According to software engineer Mark C. Chu-Carroll:

      "The key to how MapReduce works is to take input as, conceptually, a list of records. The records are split among the different computers in the cluster by Map. The result of the Map computation is a list of key/value pairs. Reduce then takes each set of values that has the same key and combines them into a single value. So Map takes a set of data chunks and produces key/value pairs and Reduce merges things, so that instead of a set of key/value pair sets, you get one result. You can't tell whether the job was split into 100 pieces or 2 pieces...MapReduce isn't intended to replace relational databases: it's intended to provide a lightweight way of programming things so that they can run fast by running in parallel on a lot of machines."

    MapReduce is important because it allows ordinary developers to use MapReduce library routines to create parallel programs without having to worry about programming for intra-cluster communication, task monitoring or failure handling. It is useful for tasks such as data mining, log file analysis, financial analysis and scientific simulations. Several implementations of MapReduce are available in a variety of programming languages, including Java, C++, Python, Perl, Ruby, and C.

    See also: Hadoop, cluster computing, distributed computing, cloud computing

    Learn more:

    Eugene Ciurana asks the question, Why should you care about MapReduce?

    John Willis provides an overview of Amazon's Elastic Map Reduce.

    Rich Seeley explains why MapReduce moves from secret Google goo to enterprise architecture.

    Learn what MapReduce and in-database technology means for data warehouses.

    Hadoop has a MapReduce tutorial.

    Contributor: Mark C. Chu-Carroll

    Learn more about Cloud APIs and frameworks
    Cloud computing programming API tutorial: In this tutorial, we will dive into the more interesting sites with resources and tips on development. Find out about the trends for APIs in emerging cloud computing architectures.

    CONTRIBUTORS: Mark C. Chu-Carroll
    LAST UPDATED: 20 Aug 2009

    Do you have something to add to this definition? Let us know.
    Send your comments to techterms@whatis.com





    FILE EXTENSION AND FILE FORMAT LIST
    File Extension and File Format List:
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #


    RELATED CONTENT
    Developers discuss pros and cons of Force.com
    At the Cloudforce conference in Boston, users swapped stories about their experiences building applications on Salesforce.com's Force.com platform.
    IBM, Microsoft, Zend and others to create PHP cloud API
    IBM, Microsoft, Nirvanix, Rackspace and GoGrid have partnered with Zend Technologies to create an open-source API for cloud application services based...
    Cloud news dominates VMworld 2009
    This week: Top stories at VMworld 2009 include VMware's vCloud Express announcement, Xen Cloud Platform and multiple new public cloud services.

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    cloud cartography  (SearchCloudComputing.com)
    Cloud cartography is a scheme for pinpointing the physical locations of Web servers hosted on a third-party cloud computing service.




    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Site Map




    All Rights Reserved, Copyright 2009, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts