The document discusses Hadoop and cloud computing. It provides an overview of Hadoop, including what it is ("flexible infrastructure for large scale computational and data processing on a network of commodity hardware"), how it works (using MapReduce for distributed processing), and some example applications. It also discusses the Hadoop file system and ecosystem. Examples of companies using Hadoop include cloud computing providers like Cloudera as well as organizations working with large datasets.
10. Els dos són sistemes distribuïts
“A distributed system is one in which the failure
of a computer you didn't even know existed can
render your own computer unusable”
Leslie Lamport
11. Els dos són sistemes distribuïts
“A distributed system is one in which the failure
of a computer you didn't even know existed can
render your own computer unusable”
Leslie Lamport
“A distributed system consists of multiple
autonomous computers that communicate
through a computer network.”
Wikipedia
15. Hadoop
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.
19. Hadoop
“Flexible infrastructure for large scale
computational and data processing on
a network of commodity hardware”
Parand Tony Darugar
20. Hadoop
“Flexible infrastructure for large scale
computational and data processing on
a network of commodity hardware”
Parand Tony Darugar
21. Hadoop
“Flexible infrastructure for large scale
computational and data processing on
a network of commodity hardware”
Parand Tony Darugar
22. Map & Reduce
Map :
V = [ 1 , 2 , 3 , 4 , 5 ]
Def quadrat( x ) = x * x;
Map ( V, quadrat ) =
For (var v : V) {
Output quadrat(v);
}
}
[1, 4, 9, 16, 25]
23. Map & Reduce
Map : Reduce :
V = [ 1 , 2 , 3 , 4 , 5 ] V = [ 1 , 4 , 9 , 16 , 25 ]
Def quadrat( x ) = x * x;
Map ( V, quadrat ) = Reduce ( V ) =
For (var v : V) { Var acum = 0;
output quadrat(v); For (var v : V) {
} acum = acum + v
} }
}
[1, 4, 9, 16, 25] 55
24. Hadoop DFS
The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
19th ACM Symposium on Operating Systems Principles,
Lake George, NY, October, 2003.
●
Dissenyat per Big Data ●
Des de fa poc permet 'append'
●
Write Once, Read Many ●
No pot ser muntat al SO
●
Datanode per màquina ●
Lectura seqüencial
●
Un Name Node per cluster (SPOAD) ●
Estable i robust
●
Tolerància a errors HW ●
Estable i robust
●
Replica Rack Aware ●
Estable i robust
27. Exemple
DFS
“paraula1” : [ 2, x, y]
2 del mapper 1
x del mapper 2
y del mapper 3
“paraula2” : [ x, z, w]
x del mapper 1
z del mapper 2
w del mapper 3
“paraula3” : [ ... ]
29. Exemple de codi
public static class Map extends Mapper<LongWritable, Text, Text,
IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
Context context) {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
30. Exemple de codi
public static class Reduce extends Reducer<Text, IntWritable,
Text, IntWritable> {
public void reduce(Text key,
Iterable<IntWritable> values, Context context) {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
31. Exemple de codi
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
38. Interessats ?
Per provar Hadoop:
http://www.cloudera.com ► Downloads
http://hadoop.apache.org
Grup d'usuaris de Hadoop i escalabilitat a nivell
nacional:
https://groups.google.com/group/spain-scalability-users
Grups al LinkedIn:
Hadoop España
Hive España
39. Preguntes ?
Marc de Palol
marc.de.palol@gmail.com
@lant