Cloud jpl

•

0 gostou•579 visualizações

The document discusses Hadoop and cloud computing. It provides an overview of Hadoop, including what it is ("flexible infrastructure for large scale computational and data processing on a network of commodity hardware"), how it works (using MapReduce for distributed processing), and some example applications. It also discusses the Hadoop file system and ecosystem. Examples of companies using Hadoop include cloud computing providers like Cloudera as well as organizations working with large datasets.

Tecnologia

Cloud Computing
i
Hadoop
X JPL
Barcelona, 01/07/2011

Marc de Palol
@lant

Els dos són sistemes distribuïts

“A distributed system is one in which the failure
of a computer you didn't even know existed can
render your own computer unusable”
Leslie Lamport

Hadoop

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat

OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.

Hadoop

●
Nutch

●
Lucene

●
Hadoop

●
Avro

Hadoop

“Flexible infrastructure for large scale
computational and data processing on
a network of commodity hardware”

Parand Tony Darugar

Map & Reduce

Map :

V = [ 1 , 2 , 3 , 4 , 5 ]
Def quadrat( x ) = x * x;

Map ( V, quadrat ) =
For (var v : V) {
Output quadrat(v);
}
}

[1, 4, 9, 16, 25]

Map & Reduce

Map : Reduce :

V = [ 1 , 2 , 3 , 4 , 5 ] V = [ 1 , 4 , 9 , 16 , 25 ]
Def quadrat( x ) = x * x;

Map ( V, quadrat ) = Reduce ( V ) =
For (var v : V) { Var acum = 0;
output quadrat(v); For (var v : V) {
} acum = acum + v
} }
}

[1, 4, 9, 16, 25] 55

Hadoop DFS

The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

19th ACM Symposium on Operating Systems Principles,
Lake George, NY, October, 2003.

●
Dissenyat per Big Data ●
Des de fa poc permet 'append'
●
Write Once, Read Many ●
No pot ser muntat al SO
●
Datanode per màquina ●
Lectura seqüencial
●
Un Name Node per cluster (SPOAD) ●
Estable i robust
●
Tolerància a errors HW ●
Estable i robust
●
Replica Rack Aware ●
Estable i robust

Exemple
DFS

Mapper
Entrada: [ “paraula1”, “paraula2”,
“paraula3”, “paraula1” ]

Sortida: [
“paraula1” : 2,
“paraula2” : 1,
“paraula3” : 1
]

Exemple
DFS

“paraula1” : [ 2, x, y]
2 del mapper 1
x del mapper 2
y del mapper 3

“paraula2” : [ x, z, w]
x del mapper 1
z del mapper 2
w del mapper 3

“paraula3” : [ ... ]

Exemple
DFS

“paraula1”:x
“paraula2”:y
“paraula1” ∑ “paraula3”:z
...

“paraula2” ∑

“paraula3” ∑

Exemple de codi

public static class Map extends Mapper<LongWritable, Text, Text,
IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
Context context) {

String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

Exemple de codi

public static class Reduce extends Reducer<Text, IntWritable,
Text, IntWritable> {

public void reduce(Text key,
Iterable<IntWritable> values, Context context) {

int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}

Exemple de codi

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount");

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
}

Interessats ?

Per provar Hadoop:

http://www.cloudera.com ► Downloads
http://hadoop.apache.org

Grup d'usuaris de Hadoop i escalabilitat a nivell
nacional:

https://groups.google.com/group/spain-scalability-users

Grups al LinkedIn:

Hadoop España
Hive España

Preguntes ?

Marc de Palol
marc.de.palol@gmail.com
@lant

Mais conteúdo relacionado

Mais procurados

Introduction to r studio on aws 2020 05_06Barry DeCicco

Python for R UsersAjay Ohri

Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger

User biglmjohnatan pladott

Spark 4th Meetup Londond - Building a Product with Sparksamthemonad

Queuing Sql Server: Utilise queues to increase performance in SQL ServerNiels Berglund

Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab

Real-Time Integration Between MongoDB and SQL Databases MongoDB

Map reduce: beyond word countJeff Patti

Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli

Python for R usersSatyarth Praveen

Parallel R in snow (english after 2nd slide)Cdiscount

Hadoop & MapReduceNewvewm

BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services

BDAS Shark study report 03 v1.1Stefanie Zhao

Big Data Processing using Apache Spark and ClojureDr. Christian Betz

Look Mommy, No GC! (TechDays NL 2017)Dina Goldshtein

Pune Clojure Course OutlineBaishampayan Ghose

Spark: Taming Big DataLeonardo Gamas

Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey

Mais procurados (20)

Introduction to r studio on aws 2020 05_06

Python for R Users

Stratosphere System Overview Big Data Beers Berlin. 20.11.2013

User biglm

Spark 4th Meetup Londond - Building a Product with Spark

Queuing Sql Server: Utilise queues to increase performance in SQL Server

Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab

Real-Time Integration Between MongoDB and SQL Databases

Map reduce: beyond word count

Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305

Python for R users

Parallel R in snow (english after 2nd slide)

Hadoop & MapReduce

BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012

BDAS Shark study report 03 v1.1

Big Data Processing using Apache Spark and Clojure

Look Mommy, No GC! (TechDays NL 2017)

Pune Clojure Course Outline

Spark: Taming Big Data

Beyond Map/Reduce: Getting Creative With Parallel Processing

Destaque

No bid left behindMarc de Palol

Competing to be uniqueSpecialist Language Courses

There Are Literally Thousands of Erlang ProjectsPierre Fenoll

HfileMarc de Palol

High Performance Erlang - Pitfalls and SolutionsYinghai Lu

State of the art introductionJolien Coenraets

Erlang containersSargun Dhillon

Netty from the trenchesJordi Gerona

Destaque (8)

No bid left behind

Competing to be unique

There Are Literally Thousands of Erlang Projects

Hfile

High Performance Erlang - Pitfalls and Solutions

State of the art introduction

Erlang containers

Netty from the trenches

Semelhante a Cloud jpl

Introducción a hadoopdatasalt

Taste Java In The CloudsJacky Chu

Introduction to Scalding and MonoidsHugo Gävert

HadoopScott Leberknight

Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala

Introduction to the hadoop ecosystem by Uwe SeilerCodemotion

Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz

Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz

Behm Shah Pagerankgothicane

Full stack analytics with Hadoop 2Gabriele Modena

Hadoop ecosystemRan Silberman

Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev

Hadoop ecosystemRan Silberman

Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon

Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr

Hadoop trainingin bangaloreappaji intelhunt

Apache Hadoop & Friends at Utah Java User's GroupCloudera, Inc.

Big data distributed processing: Spark introductionHektor Jacynycz García

Scoobi - Scala for Startupsbmlever

Introduction to SparkSriram Kailasam

Semelhante a Cloud jpl (20)

Introducción a hadoop

Taste Java In The Clouds

Introduction to Scalding and Monoids

Hadoop

Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala

Introduction to the hadoop ecosystem by Uwe Seiler

Introduction to the Hadoop Ecosystem (codemotion Edition)

Introduction to the Hadoop Ecosystem (SEACON Edition)

Behm Shah Pagerank

Full stack analytics with Hadoop 2

Hadoop ecosystem

Big Data Essentials meetup @ IBM Ljubljana 23.06.2015

Hadoop ecosystem

Scalable and Flexible Machine Learning With Scala @ LinkedIn

Introduction into scalable graph analysis with Apache Giraph and Spark GraphX

Hadoop trainingin bangalore

Apache Hadoop & Friends at Utah Java User's Group

Big data distributed processing: Spark introduction

Scoobi - Scala for Startups

Introduction to Spark

Último

Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore

Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance

Strategic AI Integration in Engineering TeamsUXDXConf

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl

WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim

IoT Analytics Company Presentation May 2024IoTAnalytics

ECS 2024 Teams Premium - Pretty SecureFemke de Vroome

Demystifying gRPC in .Net by John StaveleyJohn Staveley

Speed Wins: From Kafka to APIs in Minutesconfluent

AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379

Introduction to Open Source RAG and RAG EvaluationZilliz

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance

SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance

The Metaverse: Are We There Yet?Mark Billinghurst

PLAI - Acceleration Program for Generative A.I. StartupsStefano

Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin

Cloud jpl

1. Cloud Computing i Hadoop X JPL Barcelona, 01/07/2011 Marc de Palol @lant

2. Qui sóc ?

3. Qui sóc ?

4. Qui sóc ?

5. Qui sóc ?

6. Qui sóc ?

7. Qui sóc ?

8. Grid Computing vs Cloud

9. Grid Computing vs Cloud

10. Els dos són sistemes distribuïts “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” Leslie Lamport

11. Els dos són sistemes distribuïts “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” Leslie Lamport “A distributed system consists of multiple autonomous computers that communicate through a computer network.” Wikipedia

12. Cloud

13. Cloud

14. Hadoop

15. Hadoop MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.

16. Hadoop

17. Hadoop

18. Hadoop ● Nutch ● Lucene ● Hadoop ● Avro

19. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar

20. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar

21. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar

22. Map & Reduce Map : V = [ 1 , 2 , 3 , 4 , 5 ] Def quadrat( x ) = x * x; Map ( V, quadrat ) = For (var v : V) { Output quadrat(v); } } [1, 4, 9, 16, 25]

23. Map & Reduce Map : Reduce : V = [ 1 , 2 , 3 , 4 , 5 ] V = [ 1 , 4 , 9 , 16 , 25 ] Def quadrat( x ) = x * x; Map ( V, quadrat ) = Reduce ( V ) = For (var v : V) { Var acum = 0; output quadrat(v); For (var v : V) { } acum = acum + v } } } [1, 4, 9, 16, 25] 55

24. Hadoop DFS The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. ● Dissenyat per Big Data ● Des de fa poc permet 'append' ● Write Once, Read Many ● No pot ser muntat al SO ● Datanode per màquina ● Lectura seqüencial ● Un Name Node per cluster (SPOAD) ● Estable i robust ● Tolerància a errors HW ● Estable i robust ● Replica Rack Aware ● Estable i robust

25. Exemple DFS

26. Exemple DFS Mapper Entrada: [ “paraula1”, “paraula2”, “paraula3”, “paraula1” ] Sortida: [ “paraula1” : 2, “paraula2” : 1, “paraula3” : 1 ]

27. Exemple DFS “paraula1” : [ 2, x, y] 2 del mapper 1 x del mapper 2 y del mapper 3 “paraula2” : [ x, z, w] x del mapper 1 z del mapper 2 w del mapper 3 “paraula3” : [ ... ]

28. Exemple DFS “paraula1”:x “paraula2”:y “paraula1” ∑ “paraula3”:z ... “paraula2” ∑ “paraula3” ∑

29. Exemple de codi public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }

30. Exemple de codi public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }

31. Exemple de codi public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }

32. Workflow DB LOGS HDFS DB NoSQL

33. Qui ho utilitza?

34. Qui ho utilitza?

35. Ecosistema Hadoop

36. Ecosistema Hadoop

37. Comunitat Hadoop Suport:

38. Interessats ? Per provar Hadoop: http://www.cloudera.com ► Downloads http://hadoop.apache.org Grup d'usuaris de Hadoop i escalabilitat a nivell nacional: https://groups.google.com/group/spain-scalability-users Grups al LinkedIn: Hadoop España Hive España

39. Preguntes ? Marc de Palol marc.de.palol@gmail.com @lant

Cloud jpl

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (8)

Semelhante a Cloud jpl

Semelhante a Cloud jpl (20)

Último

Último (20)

Cloud jpl