“WordCount”, a “Hello World” for MapReduce
Definition: count how often each word appears within a collection
of text documents.
A simple program which illustrates a pretty good test case for what
MapReduce can perform, since it incorporates:
• minimal amount code
• document feature extraction (where words are “terms”)
• symbolic and numeric values
• potential use of a combiner
• bipartite graph of (doc, term) tuples
• not so many steps away from useful indexing…
When a framework can run “WordCount” in parallel at scale, then it
can handle much larger, more interesting compute problems as well.