How Does MapReduce Work?
map(k1, v1) → list(k2, v2)
reduce(k2, list(v2)) → list(v3)
Several phases, which partition a problem into many tasks:
• load data into DFS…
• map phase: input split → (key, value) pairs, with optional combiner
• shuffle phase: sort on keys to group pairs… load-test your network!
• reduce phase: each task receives the values for one key
• pull data from DFS…
NB: “map” phase is required, the rest are optional.
Think of set operations on tuples (and check out Cascading.org).
Meanwhile, given all those (key, value) pairs listed above, it’s no
wonder that key/value stores have become such a popular topic of