Presentation by Ken Krugler at the SDForum SAM SIG (Software Architecture & Modeling) meeting on Sept 22nd, 2010. This talk provides a brief introduction to Map-Reduce & Hadoop, then discusses challenges of implementing complex data processing using low-level Map-Reduce support, and a number of solutions.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Thinking at scale with hadoop
1. 1
Thinking at
Scale with
Hadoop
Ken Krugler
photo by: i_pinz, flickr
Bixo Labs &
Scale Unlimited
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
2. 2
Introduction - Ken Krugler
Founder, Bixo Labs - http://bixolabs.com
Large scale web mining workflows, running in Amazon’s cloud
Instructor, Scale Unlimited - http://scaleunlimited.com
Teaching Hadoop Boot Camp
Founder, Krugle (2005 - 2009)
Code search startup
Parse/index 2.6 billion lines of open source code
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
3. 3
Goals for Talk
Intro to Map-Reduce
Overview of Hadoop
Challenges with Map-Reduce
Solutions to Complexity
Friday, September 24, 2010
4. 4
Introduction to
Map-Reduce
Ken Krugler
photo by: exfordy, flickr
Bixo Labs &
Scale Unlimited
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
5. 5
Definitions
Map -> The ‘map’ function in the MapReduce algorithm, user defined
Reduce -> The ‘reduce’ function in the MapReduce algorithm, user defined
Group -> The ‘magic’ that happens between Map and Reduce
Key Value Pair -> two units of data, exchanged between Map & Reduce
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
6. 6
How MapReduce Works
Map translates input to keys and values to new keys and values
[K1,V1] Map [K2,V2]
System Groups each unique key with all its values
[K2,V2] Group [K2,{V2,V2,....}]
Reduce translates the values of each unique key to new keys and values
[K2,{V2,V2,....}] Reduce [K3,V3]
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
7. 7
All Together
Mapper
Mapper Shuffle Reducer
Mapper Shuffle Reducer
Mapper Shuffle Reducer
Mapper
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
8. 8
Canonical Example - Word Count
Word Count
Read a document, parse out the words, count the frequency of each word
Specifically, in MapReduce
With a document consisting of lines of text
Translate each line of text into key = word and value = 1
Sum the values (ones) for each unique word
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
9. 9
[1, When in the Course of human events, it becomes]
[2, necessary for one people to dissolve the political bands]
[3, which have connected them with another, and to assume]
[K1,V1]
[n, {k,k,k,k,...}]
Map
[K2,V2] [When,1] [in,1] [the,1] [Course,1] [k,1]
User
Group
Defined
[K2,{V2,V2,....}] [When,{1,1,1,1}] [people,{1,1,1,1,1,1}] [dissolve,{1,1,1}]
[connected,{1,1,1,1,1}] [k,{v,v,...}]
Reduce
[When,4] [peope,6] [dissolve,3]
[K3,V3]
[connected,5] [k,sum(v,v,v,..)]
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
10. 10
Divide & Conquer
Because
The Map function only cares about the current key and value, and
The Reduce function only cares about the current key and its values
Mappers & Reducers can run Map/Reduce functions independently
Then
A Mapper can invoke Map on an arbitrary number of input keys and values
A Reducer can invoke Reduce on an arbitrary number of the unique keys
Any number of Mappers & Reducers can be run on any number of nodes
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
11. 11
Overview of
Hadoop
Ken Krugler
photo by: exfordy, flickr
Bixo Labs &
Scale Unlimited
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
12. 12
Logical Architecture
Cluster
Execution
Storage
Logically, Hadoop is simply a computing cluster that provides:
a Storage layer, and
an Execution layer
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
13. 13
Logical Architecture
Users Cluster
Client
Client
Execution
job job job
Storage
Client
Where users can submit jobs that:
Run on the Execution layer, and
Read/write data from the Storage layer
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
14. 14
Storage - HDFS
Geography Geography
Rack Rack Rack
Node Node Node Node ...
block block block block
block block block block
Performance
Optimized for many concurrent serialized reads of large files
But only a single writer (write-once, no append)
Fidelity
Replication via block replication
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
15. 15
Execution
Geography Geography
Rack Rack Rack
Node Node Node Node ...
Performance map map map map map
Divide and Conquer
reduce reduce
reduce
MapReduce
Data Affinity
Move code to data
Fidelity
Applications must choose to fail on bad data, or ignore bad data
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
16. 16
Architectural Components
NameNode DataNode
DataNode
DataNode
DataNode data block
ns read/write
operations Secondary ns
operations
read/write ns
operations read/write
mapper
mapper
child jvm
mapper
child jvm
jobs tasks child jvm
Client JobTracker
TaskTracker
reducer
reducer
child jvm
reducer
child jvm
child jvm
Solid boxes are unique applications
Dashed boxes are child JVM instances on same node as parent
Dotted boxes are blocks of managed files on same node as parent
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
17. 17
Challenges of
Map-Reduce
Ken Krugler
photo by: Graham Racher, flickr
Bixo Labs &
Scale Unlimited
Copyright (c) 2008-2010 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
18. 18
Complexity of keys/values
Simple != Simplicity
People think about processing records/objects
The correct key definition changes during a workflow
A “value” is typically a set of fields
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
19. 19
Complexity of chaining jobs
Most real-world workflows require multiple Map-Reduce jobs
Connecting data between jobs is tedious
Keeping key/value definitions in sync is error-prone
As an example...
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
20. 20
A trivial filter/count/sort workflow
private void start(String inputPath, String outputPath, String filterUser) // run job and block until job is done
throws Exception { filterCountJob.waitForCompletion(false);
// counts the pages the users clicked and sort by number of clicks
// sort
// filter and count Job sortJob = new Job(conf, "sort job");
Path outputDir = new Path(outputPath); FileInputFormat.setInputPaths(sortJob, tmpOutput);
Path parent = outputDir.getParent(); FileOutputFormat.setOutputPath(sortJob, outputDir);
Path tmpOutput = new Path(parent, "" + System.currentTimeMillis());
sortJob.setMapperClass(SortMapper.class);
// Create the job configuration sortJob.setMapOutputKeyClass(UserAndCount.class);
Configuration conf = new Configuration(); sortJob.setMapOutputValueClass(NullWritable.class);
conf.set(FilterMapper.USER_KEY, filterUser);
// sort are only stable with one reducer but you lose distribution.
Job filterCountJob = new Job(conf, "filter+count job"); sortJob.setNumReduceTasks(1);
FileInputFormat.setInputPaths(filterCountJob, new Path(inputPath)); sortJob.setInputFormatClass(SequenceFileInputFormat.class);
FileOutputFormat.setOutputPath(filterCountJob, tmpOutput); sortJob.setOutputFormatClass(TextOutputFormat.class);
filterCountJob.setMapperClass(FilterMapper.class); // run job and block until job is done
filterCountJob.setMapOutputKeyClass(Text.class); sortJob.waitForCompletion(false);
filterCountJob.setMapOutputValueClass(LongWritable.class);
// Get rid of temp directory used to bridge between the two jobs
filterCountJob.setReducerClass(CounterReducer.class); FileSystem.get(conf).delete(tmpOutput, true);
filterCountJob.setOutputKeyClass(Text.class);
filterCountJob.setOutputValueClass(LongWritable.class);
filterCountJob.setInputFormatClass(TextInputFormat.class);
filterCountJob.setOutputFormatClass(SequenceFileOutputFormat.class);
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
21. 21
Not seeing the forest...
Too much low-level detail obscures intent
Not seeing intent leads to bugs
And makes optimization harder
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
22. 22
Hard to use MR optimizations
MR => MR{0,1} => M{1,}R{0,1} => M{1,}R{0,1}M{0,}
Combiners can reduce map output size
Map-side joins where one of the sides is “small”
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
23. 23
Solutions to Complexity
Cascading - Tuples, Pipes, and Operations
FlumeJava - Java Classes and Deferred Execution
Pig/Hive - “It’s just a big database”
Datameer - “It’s just a big spreadsheet”
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
24. 24
Cascading
Ken Krugler
photo by: Graham Racher, flickr
Bixo Labs &
Scale Unlimited
Copyright (c) 2008-2010 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
25. 25
Cascading
An API for defining data processing workflows
First public release Jan 2008
Currently 1.1.1 and supports Hadoop 0.19.x through 0.20.2
http://www.cascading.org
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
26. 26
Cascading
Basic unit of data is “Tuple” - roughly equal to a record
Implements all standard data processing operations
function, filter, group by, co-group, aggregator, etc
Complex applications are built by chaining operations with pipes
And are planned into the minimum number of MapReduce jobs
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
27. 27
Operations on Tuples
Map and Reduce communicate through Keys and Values
Our problems are really about elements of Keys and Values
e.g. “username”, not “line”
The effect is more reusability of our ‘map’ and ‘reduce’ operations
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
28. 28
Mapping onto Hadoop
Functions & Filters are map operations
Aggregates & Buffers are reduce operations
Built-in support for joining, multi-field sorting
Concept of “Taps” with “Schemes” as input and output formats
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
29. 29
Loose Coupling
func
Source func Group aggr Sink
Above, all the possible relationships between operations and data
If Fields are the interface between operations, we can build flexible apps
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
30. 30
Layered Over MapReduce
Map Reduce Map Reduce
Source func CoGroup aggr temp GroupBy aggr func Sink
Source func func
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
31. 31
Cascading Code
Tap input = new Hfs( new TextLine( new Fields( "line" ) ), inputPath );
Tap output = new Hfs( new TextLine( 1 ), outputPath, SinkMode.REPLACE );
Pipe pipe = new Pipe( "example" );
RegexSplitGenerator function = new RegexSplitGenerator( new Fields( "word" ), "s+" );
pipe = new Each( pipe, new Fields( "line" ), function );
pipe = new GroupBy( pipe, new Fields( "word" ) );
pipe = new Every( pipe, new Fields( "word" ), new Count( new Fields( "count" ) ) );
pipe = new Every( pipe, new Fields( "word" ), new Average( new Fields( "avg" ) ) );
pipe = new GroupBy( pipe, new Fields( "count", -1 ) );
FlowConnector.setApplicationJarClass( new Properties(), Main.class );
Flow flow = new FlowConnector( properties ).connect( "example flow", input, output, pipe );
flow.complete();
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
32. 32
[head]
Hfs['SequenceFile[['ip', 'time', 'method', 'event', 'status', 'size']]']['output/logs/']']
['ip', 'time', 'method', 'event', 'status', 'size']
['ip', 'time', 'method', 'event', 'status', 'size']
Each('arrival rate')[DateParser[decl:'ts'][args:1]]
['ts'] ['ts']
['ts'] ['ts']
GroupBy('tsCount')[by:['ts']] Each('arrival rate')[ExpressionFunction[decl:'tm'][args:1]]
['tm']
tsCount['ts']
['tm']
['ts']
GroupBy('tmCount')[by:['tm']]
Every('tsCount')[Count[decl:'count']]
tmCount['tm']
['ts', 'count'] ['tm']
['ts']
Every('tmCount')[Count[decl:'count']]
Hfs['TextLine[['offset', 'line']->[ALL]]']['output/arrivalrate/sec']']
['tm', 'count']
['tm']
Hfs['TextLine[['offset', 'line']->[ALL]]']['output/arrivalrate/min']']
[tail]
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
34. 34
FlumeJava
Ken Krugler
photo by: Graham Racher, flickr
Bixo Labs &
Scale Unlimited
Copyright (c) 2008-2010 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
35. 35
FlumeJava
Recent paper (June 2010) by Googlites - Chambers et. al.
Internal project to make it easier to write Map-Reduce jobs
Equivalent open source project just started
Plume - http://github.com/tdunning/Plume
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
36. 36
Java Classes, Deferred Exec
Java-centric view of map-reduce processing
Handful of classes to represent immutable, parallel collections
PCollection, PTable
Small number of primitive operations on classes
parallelDo, groupByKey, combineValues, flatten
Deferred execution allows construction/optimization of workflow
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
37. 37
FlumeJava Code
PCollection<String> lines = readTextFileCollection("/gfs/data/shakes/hamlet.txt");
Collection<String> words = lines.parallelDo(new DoFn<String,String>() {
void process(String line, EmitFn<String> emitFn) {
for (String word : splitIntoWords(line)) {
emitFn.emit(word);
}
}
}, collectionOf(strings()));
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
38. 38
Results of Use at Google
Text
Text
Lines of code
Number of Map-Reduce Jobs
Text
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
39. 39
Pig & Hive
photo by: Tambako the Jaguar, flickr
Ken Krugler
Bixo Labs &
Scale Unlimited
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
40. 40
Pig
Pig is a data flow ‘programming environment’ for processing large files
PigLatin is the text language it provides for defining data flows
Developed by researchers in Yahoo! and widely used internally
A sub-project of Hadoop
APL licensed
http://hadoop.apache.org/pig/
First incubator release v0.1.0 on Sept 2008
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
41. 41
Pig
Provides two kinds of optimizations:
Physical - caching and stream optimizations at the MapReduce level
Algebraic - mathematical reductions at the syntax level
Allows for User Defined Functions (UDF)
must register jar of functions via register path/some.jar
can be called directly from PigLatin
B = foreach A generate myfunc.MyEvalFunc($0);
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
42. 42
PigLatin - Example
W = LOAD 'filename' AS (url, outlink);
G = GROUP W by url;
R = FOREACH G {
FW = FILTER W BY outlink eq 'www.apache.org';
PW = FW.outlink;
DW = DISTINCT PW;
GENERATE group, COUNT(DW);
}
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
43. 43
Hive
Hive is a data warehouse built on top of flat files
Developed by Facebook
Available as a Hadoop contribution in v0.19.0
http://wiki.apache.org/hadoop/Hive
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
44. 44
Hive
Data Organization into Tables with logical and hash partitioning
A Metastore to store metadata about Tables/Partitions/Buckets
Partitions are ‘how’ the data is stored (collections of related rows)
Buckets further divide Partitions based on a hash of a column
metadata is actually stored in a RDBMS
A SQL like query language over object data stored in Tables
Custom types, structs of primitive types
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
45. 45
Hive - Queries
Queries always write results into a table
FROM user
INSERT OVERWRITE TABLE user_active
SELECT user.*
WHERE user.active = true;
FROM pv_users
INSERT OVERWRITE TABLE pv_gender_sum
SELECT pv_users.gender, count_distinct(pv_users.userid)
GROUP BY pv_users.gender
INSERT OVERWRITE DIRECTORY '/user/facebook/tmp/pv_age_sum.txt'
SELECT pv_users.age, count_distinct(pv_users.userid)
GROUP BY pv_users.age;
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
46. 46
Datameer
Analytics
Solution (DAS)
photo by: Tambako the Jaguar, flickr
Ken Krugler
Bixo Labs &
Scale Unlimited
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
47. 47
DAS - Big Spreadsheets
Commercial solution from Datameer
Similar to IBM BigSheets
GUI + workflow editor on top of Hadoop
Define data sources, operations (via spreadsheet), and data sinks
http://www.datameer.com
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
48. 48
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
49. 49
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
50. 50
Summary
photo by: Tambako the Jaguar, flickr
Ken Krugler
Bixo Labs &
Scale Unlimited
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
51. 51
Thinking at Scale with Hadoop
Understand the underlying Map-Reduce architecture
Model the problem as a workflow with operations on records
Consider using Cascading or (eventually) Plume
Use Pig/Hive to solve query-centric problems
Look at DAS or BigSheets for easier analytics solutions
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010
52. 52
Resources
Hadoop mailing lists - many, e.g. common-user@hadoop.apache.org
Hadoop: The Definitive Guide by Tom White (O’Reilly)
Hadoop Training
Scale Unlimited: http://scaleunlimited.com/courses
Cloudera: http://www.cloudera.com/hadoop-training/
Cascading: http://www.cascading.org
DAS: http://www.datameer.com
Friday, September 24, 2010
53. 53
Questions?
Copyright (c) 2008 Scale Unlimited, Inc. All Rights Reserved. Reproduction or distribution of this document in any form without prior written permission is forbidden.
Friday, September 24, 2010