4. Hadoop is great at plowing
through data
@JoinTheFlock | Hadoop Summit, June 14 2012 4
Image source: http://en.wikipedia.org/wiki/File:Snowplow_in_the_morning.jpg
5. And we do plow
10s of Thousands of Jobs per day
100 TB (uncompressed) ingested daily
Many users and diverse use cases
@JoinTheFlock | Hadoop Summit, June 14 2012 5
6. Looking for needles in
haystacks.
@JoinTheFlock | Hadoop Summit, June 14 2012 6
Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
7. Looking for needles in
haystacks.
With snowplows.
@JoinTheFlock | Hadoop Summit, June 14 2012 6
Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
8. A Pig Script
event_logs = load '/logs/lots_of_data'
using ThriftPigLoader('thrift.gen.LogEvent');
filtered_logs = filter event_logs by event == 'something_rare';
-- Then do stuff.
90% of the mappers in this job output no data.
We can do better...
@JoinTheFlock | Hadoop Summit, June 14 2012 7
15. Keep the data sorted!
@JoinTheFlock | Hadoop Summit, June 14 2012 10
16. Keep the data sorted!
• Painful to maintain
@JoinTheFlock | Hadoop Summit, June 14 2012 10
17. Keep the data sorted!
• Painful to maintain
• Only one sort order at a time
@JoinTheFlock | Hadoop Summit, June 14 2012 10
18. Keep the data sorted!
• Painful to maintain
• Only one sort order at a time
• Rewrite or duplicate for different query patterns
@JoinTheFlock | Hadoop Summit, June 14 2012 10
21. Trojan Layouts*
• Identify interesting column groupings
• Use different column groupings per HDFS block replica
* http://infosys.uni-saarland.de/publications/JQD11.pdf
@JoinTheFlock | Hadoop Summit, June 14 2012 11
22. Trojan Layouts*
• Identify interesting column groupings
• Use different column groupings per HDFS block replica
• Requires changes to NN
* http://infosys.uni-saarland.de/publications/JQD11.pdf
@JoinTheFlock | Hadoop Summit, June 14 2012 11
23. Trojan Layouts*
• Identify interesting column groupings
• Use different column groupings per HDFS block replica
• Requires changes to NN
• ... and increases load on NN
* http://infosys.uni-saarland.de/publications/JQD11.pdf
@JoinTheFlock | Hadoop Summit, June 14 2012 11
24. HBase!
@JoinTheFlock | Hadoop Summit, June 14 2012 12
26. HBase!
• Good solution in many cases!
• Maintenance overhead
@JoinTheFlock | Hadoop Summit, June 14 2012 12
27. HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
@JoinTheFlock | Hadoop Summit, June 14 2012 12
28. HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
• Full table scans slower than MR
@JoinTheFlock | Hadoop Summit, June 14 2012 12
29. HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
• Full table scans slower than MR
• Again with the up-front design
@JoinTheFlock | Hadoop Summit, June 14 2012 12
30. HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
• Full table scans slower than MR
• Again with the up-front design
• Secondary Indexes can help
@JoinTheFlock | Hadoop Summit, June 14 2012 12
31. Hive!
@JoinTheFlock | Hadoop Summit, June 14 2012 13
32. Hive!
• That kind of works, actually.
@JoinTheFlock | Hadoop Summit, June 14 2012 13
33. Hive
Generic Interface for defining indexing behavior.
Reference implementation: “compact” index
value -> list of HDFS blocks; drop unneeded blocks.
Other indexes available (bitmap in 0.8)
It’ll even update indexes as you add partitions.
@JoinTheFlock | Hadoop Summit, June 14 2012 14
35. Hive
Good news if your data is in Hive!
Bad news if your world is a little bigger.
Indexing is tightly coupled to Hive.
No interoperability with the rest of the Hadoop stack.
@JoinTheFlock | Hadoop Summit, June 14 2012 16
36. Democracy of Tools
@JoinTheFlock | Hadoop Summit, June 14 2012 17
Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
37. Democracy of Tools
• Pig
@JoinTheFlock | Hadoop Summit, June 14 2012 17
Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
38. Democracy of Tools
• Pig
• Raw Map-Reduce
@JoinTheFlock | Hadoop Summit, June 14 2012 17
Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
39. Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)
@JoinTheFlock | Hadoop Summit, June 14 2012 17
Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
40. Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)
• Mahout
@JoinTheFlock | Hadoop Summit, June 14 2012 17
Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
41. Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)
• Mahout
• Maybe even Hive
@JoinTheFlock | Hadoop Summit, June 14 2012 17
Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
45. Design Goals
• Minimal Job/Script modification required
• As low in the stack as possible
@JoinTheFlock | Hadoop Summit, June 14 2012 18
46. Design Goals
• Minimal Job/Script modification required
• As low in the stack as possible
• In fact, pretty sure we could get Hive to use this...
@JoinTheFlock | Hadoop Summit, June 14 2012 18
47. Design Goals
• Minimal Job/Script modification required
• As low in the stack as possible
• In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
@JoinTheFlock | Hadoop Summit, June 14 2012 18
48. Design Goals
• Minimal Job/Script modification required
• As low in the stack as possible
• In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
• Allow post-factum indexing
@JoinTheFlock | Hadoop Summit, June 14 2012 18
49. Design Goals
• Minimal Job/Script modification required
• As low in the stack as possible
• In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
• Allow post-factum indexing
• Graceful degradation
@JoinTheFlock | Hadoop Summit, June 14 2012 18
50. Design Goals
• Minimal Job/Script modification required
• As low in the stack as possible
• In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
• Allow post-factum indexing
• Graceful degradation
• Flexible on-disk representation
@JoinTheFlock | Hadoop Summit, June 14 2012 18
51. Elephant-Twin
Twitter’s library for creating indexes in Hadoop
https://github.com/twitter/elephant-twin
https://github.com/twitter/elephant-twin-lzo
@JoinTheFlock | Hadoop Summit, June 14 2012 19
52. Block-Level Indexes
For each value, record the block it occurs in
“Block” can be HDFS block (100s of MBs)
Or LZO block (100s of KBs)
Or SequenceFile block
Or RCFile block ...
Ignore irrelevant blocks
Scan relevant blocks using original InputFormat
@JoinTheFlock | Hadoop Summit, June 14 2012 20
53. Record-Level Indexes
For each value, record some representation of the record
Can be value + offset, as in bitmap indexes
Can be transformed projection of records, as in Lucene indexes
Some queries can be answered directly from index.
@JoinTheFlock | Hadoop Summit, June 14 2012 21
54. Indexing:
MR
Index
job
InputFormat
Data
@JoinTheFlock | Hadoop Summit, June 14 2012 22
55. Creating an Index
public abstract class AbstractBlockIndexingJob {
protected abstract List<String> getInput();
protected abstract String getIndex();
protected abstract String getInputFormat();
protected abstract String getValueClass();
protected abstract String getColumnName();
protected abstract Job setMapper(Job job);
}
public abstract class AbstractLuceneIndexingJob {
// Similar.
}
@JoinTheFlock | Hadoop Summit, June 14 2012 23
56. Creating an Index
Mapper transforms the records: emit <DocId, Value>
Key Value
Block Offset Column Value
Tweet Id Text
Block helper:
public abstract class BlockIndexingMapper<KIN, VIN> extends
Mapper<KIN, VIN, TextLongPairWritable, LongPairWritable> {}
Lucene helper:
public abstract class AbstractIndexingMapper<KIN, VIN, KOUT, VOUT>
extends Mapper<KIN, VIN, KOUT, VOUT>
abstract protected boolean filter(KIN k, VIN v);
abstract protected KOUT buildOutputKey(KIN k, VIN v);
@JoinTheFlock | Hadoop Summit, June 14 2012 24
57. Creating an Index
Reducer writes appropriately processed indexes and metadata.
MapFile block index:
public class MapFileIndexingReducer
extends Reducer<TextLongPairWritable, LongPairWritable,
Text, ListLongPair>
Lucene index:
public abstract class AbstractLuceneIndexingReducer<KIN, VIN>
extends Reducer<KIN, VIN, NullWritable, NullWritable> {
protected abstract Document buildDocument(KIN k, VIN v);
}
@JoinTheFlock | Hadoop Summit, June 14 2012 25
59. MR
job searchKey
IndexedInputFormat
Retrieval:
Index
Data
@JoinTheFlock | Hadoop Summit, June 14 2012 27
60. InputFormat
public class BlockIndexedFileInputFormat<K, V> extends
FileInputFormat<K, V> {
// Indexing jobs call this function to set up indexing job
related parameters.
public static void setIndexOptions(Job job,
String inputformatClass, String valueClass,
String indexDir, String columnName)
// Searching jobs call this function to set up searching job
related parameters.
public static void setSearchOptions(Job job,
String inputformatClass, String valueClass,
String indexDir, BinaryExpression filter)
}
@JoinTheFlock | Hadoop Summit, June 14 2012 28
62. Pig Integration
event_logs = load '/logs/lots_of_data'
using ThriftPigLoader(
'thrift.gen.LogEvent');
filtered_logs = filter event_logs by event == 'something_rare';
-- Then do stuff.
@JoinTheFlock | Hadoop Summit, June 14 2012 30
63. Pig Integration
register elephant-twin-1.0.jar
event_logs = load '/logs/lots_of_data'
using IndexedLZOPigLoader(
'ThriftPigLoader',
'thrift.gen.LogEvent',
'/user/dmitriy/etwin');
-- Pig will automatically push this down into the Loader and InputFormat
filtered_logs = filter event_logs by event == 'something_rare';
@JoinTheFlock | Hadoop Summit, June 14 2012 31
65. Optimization: merge neighbors
HDFS Block 1 HDFS Block 2
Merge neighbors, share the scan.
(Limit expansion to size of HDFS block)
@JoinTheFlock | Hadoop Summit, June 14 2012 33
66. Optimization: merge neighbors
HDFS Block 1 HDFS Block 2
Scans are faster than random reads.. allow gaps?
Turns out, not that much faster. Better to jump.
@JoinTheFlock | Hadoop Summit, June 14 2012 34
67. Optimization: combine small splits
HDFS Block 1 HDFS Block 2
match match match
Generated Split
Combine small relevant spans into single splits.
Try to take locality into account.
@JoinTheFlock | Hadoop Summit, June 14 2012 35
68. Applicability
Most keys occur in very few blocks!
Most frequent key only occurs in half the blocks.
@JoinTheFlock | Hadoop Summit, June 14 2012 36
69. Results
Applicable Jobs take 5-10x fewer resources
Ad-hoc jobs particularly likely to benefit
“Real” indexes still faster..
-- but can be represented using the same abstraction
@JoinTheFlock | Hadoop Summit, June 14 2012 37
70. Future Work
@JoinTheFlock | Hadoop Summit, June 14 2012 38
Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
71. Future Work
• Regex matching on keys
@JoinTheFlock | Hadoop Summit, June 14 2012 38
Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
72. Future Work
• Regex matching on keys
• Better Pig pushdown support
@JoinTheFlock | Hadoop Summit, June 14 2012 38
Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
73. Future Work
• Regex matching on keys
• Better Pig pushdown support
• MultiIndexInputFormat
@JoinTheFlock | Hadoop Summit, June 14 2012 38
Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
74. Future Work
• Regex matching on keys
• Better Pig pushdown support
• MultiIndexInputFormat
• Traditional indexes under ETwin
@JoinTheFlock | Hadoop Summit, June 14 2012 38
Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
75. Future Work
• Regex matching on keys
• Better Pig pushdown support
• MultiIndexInputFormat
• Traditional indexes under ETwin
• Index maintenance (via HCatalog?)
@JoinTheFlock | Hadoop Summit, June 14 2012 38
Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg