Hadoop introduction berlin buzzwords 2011

An Introduction
Kai Voigt, Cloudera Inc
Berlin Buzzwords, June 6th 2011

Montag, 6. Juni 2011

Big Data

• Capture

• Storage

• Search

• Analytics


hadoop.apache.org

Google Filesystem Hadoop Distributed
(GFS) Filesystem (HDFS)

MapReduce MapReduce


Hadoop Distributed
Filesystem
• Easy Access
• Distributed
• Redundant
• Scalable


File Splits
Large File

6440M

Block Block Block Block Block Block Block Block
1 2 3 4 5 6 ... 100 101

64M 64M 64M 64M 64M 64M 64M 40M


Block Placement
Block Block Block Block Block
1 2 3 2 2

Block Block Block
3 1 3

Block
1

Node 1 Node 2 Node 3 Node 4 Node 5


MapReduce
Block Block Block Block
Input 1 2 3 4

map()
Inter-
mediate
Data
reduce()

Output


WordCount Example
map (offset, line) {
foreach word in line {
emit (word, 1)
}
}


WordCount Example
reduce (word, count[]) {
total = 0;
foreach number in count[] {
total += number
}
emit (word, total)
}


map()

( , 1) ( , 1) ( , 1)
( , 1) ( , 1) ( , 1)
( , 1) ( , 1) ( , 1)
( , 1) ( , 1) ( , 1)


Sort & Shufﬂe
( , 1) ( , 1) ( , 1)
( , 1) ( , 1) ( , 1)
( , 1) ( , 1) ( , 1)
( , 1) ( , 1) ( , 1)

( , (1,1))
( , (1,1,1,1))
( , (1,1))
( , (1,1))
( , (1))


reduce()
( , (1,1))
( , (1,1,1,1))
( , (1,1))
( , (1,1))
( , (1))

( , 2) ( , 4)
( , 2) ( , 2)
( , 1)


Use Case:
Recommendations
• "People looking as this article also looked
at these articles"
• "You might also know these people"
• iTunes Genius Playlist
• Banner Placement


Use Case: Text
Processing

• Document Indexing
• Semantic Analytics


Use Case: Machine
Learning
Data
Data
Data

• Spam vs No Spam
• Credit Card Fraud
• "People of Interest"
Information


Use Case: Graphs

• Shortest Paths
• Bottleneck Nodes
• Flow Optimization
• Spanning Trees
• Spanning Routes


Hadoop Ecosystem
Hive & Pig High Level Language Access
HBase Real Time Access
Sqoop SQL to/from Hadoop
Flume Distributed Data Collection
Oozie Job Workﬂow
Mahout Machine Learning Library

many more


Homework

• Cloudera's Distribution including Hadoop
(CDH3)
• Online Tutorials
• WordCount Example
• Conference Demo Cluster


Quick Demo!


Thank you!

• Kai Voigt
• kai@cloudera.com
• http://www.cloudera.com/
• http://apache.hadoop.org/


Hadoop introduction berlin buzzwords 2011

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de iammutex

Mais de iammutex (20)

Hadoop introduction berlin buzzwords 2011