MapReduce
![]() |
![]() |
|
MapReduceFrom Wikipedia the free encyclopedia, by MultiMedia MapReduce is a programming tool developed by Google
in C++, in which parallel computations over large (> 1 terabyte) data
sets are performed. The terminology of "Map" and "Reduce", and their
general idea, is borrowed from functional programming languages use of
the constructs map and reduce in functional programming and features of
array programming languages. map and reduceIn simpler terms, what a map function does is go over a
conceptual list of independent elements (for example, a list of test scores)
and performs a specified operation on each element (with the previous
example, one might have discovered a flaw in the test that gave each student
a score too high by one; one could then define a map function of "minus 1"-
it would subtract one from each score, correcting them.); the fact that each
element is operated on independently, and that the original list is not
being modified because a new list is created to hold the answers means that
it is very easy to make a map operation highly parallel, and thus useful in
high-performance applications and domains like parallel programming. Distribution and reliabilityMapReduce achieves reliability by parceling out a number of
operations on the set of data to each node in the network; each node is
expected to report back periodically with completed work and status updates.
If a node falls silent for longer than that interval, the master node
(similar to the master server in the Google
File System) records the node as dead,
and sends out the node's assigned data to other nodes. Individual operations
use atomic operations for naming file outputs as a double check to insure
that there are not parallel conflicting threads running; when files are
renamed, it is possible to also copy them to another name in addition to the
name of the task (allowing for side-effects). UsesAccording to Google, they use MapReduce in a wide range of
applications, including: "distributed grep, distributed sort, web link-graph
reversal, term-vector per host, web access log stats inverted index
construction, document clustering, machine learning, statistical machine
translation..." Most significantly, when MapReduce was finished, it was used
to completely regenerate Google's index of the Internet, and replaced the
old ad hoc programs that updated the index. Other ImplementationsThe Nutch project has developed an experimental implementation of MapReduce. References
"Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." -"MapReduce: Simplified Data Processing on Large Clusters" External link
Google Guide made by MultiMedia | Free content and software This guide is licensed under the GNU Free Documentation License. It uses material from the Wikipedia. |
||
![]() |
![]() |
|






Bookmark this site
Bookmark this page
Make Us your homepage