SlideShare a Scribd company logo
1 of 53
Download to read offline
Apache Hadoop &
    Friends
         Philip Zeyliger
     philip@cloudera.com
     @philz42 @cloudera
      February 18, 2010




                           1
Hi there!

     Software Engineer
     Worked at




And, oh, I ski (though mostly in Tahoe)


                                          2
@cloudera, I work on...




                    Apache Avro
 Cloudera Desktop


                                  3
Outline
Why should you care? (Intro)
Challenging yesteryear’s assumptions
The MapReduce Model
HDFS, Hadoop Map/Reduce
The Hadoop Ecosystem
Questions

                                       4
Data is everywhere.

Data is important.



                      5
6
7
8
“I keep saying that the sexy job
   in the next 10 years will be
statisticians, and I’m not kidding.”

                               Hal Varian
              (Google’s chief economist)




                                            9
Are you throwing
   away data?
Data comes in many shapes and sizes:
relational tuples, log files, semistructured
textual data (e.g., e-mail), … .
Are you throwing it away because it
doesn’t ‘fit’?
                                              10
So, what’s Hadoop?


       The Little Prince, Antoine de Saint-Exupéry, Irene Testot-Ferry




                                                                         11
Apache Hadoop is an open-source
system (written in Java!) to reliably
        store and process
             gobs of data
across many commodity computers.




        The Little Prince, Antoine de Saint-Exupéry, Irene Testot-Ferry


                                                                          12
Two Big
          Components
      HDFS               Map/Reduce

 Self-healing high-
    bandwidth              Fault-tolerant
clustered storage.    distributed computing.



                                               13
Challenging some of
   yesteryear’s
  assumptions...


                      14
Assumption 1: Machines can be reliable...




Image: MadMan the Mighty CC BY-NC-SA
                                            15
Hadoop Goal:

   Separate distributed
  system fault-tolerance
code from application logic.
           Systems
         Programmers   Statisticians

                                       16
Assumption 2: Machines have identities...
                                            Image:Laughing Squid CC BY-
                                            NC-SA
                                                                          17
Hadoop Goal:

Users should interact with
 clusters, not machines.




                             18
Assumption 3: A data set fits on one machine...
                                                 Image: Matthew J. Stinson CC-
                                                 BY-NC
                                                                             19
Hadoop Goal:

   System should scale
linearly (or better) with
        data size.


                            20
The M/R
Programming Model




                    21
You specify map()
   and reduce()
    functions.

The framework does
      the rest.
                     22
fault-tolerance
  (that’s what’s important)
  (and that’s why Hadoop)




                              23
map()
            map: K₁,V₁→list K₂,V₂

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
  /**
    * Called once for each key/value pair in the input split. Most applications
    * should override this, but the default is the identity function.
    */
  protected void map(KEYIN key, VALUEIN value,
                      Context context) throws IOException,
                      InterruptedException {
     // context.write() can be called many times
     // this is default “identity mapper” implementation
     context.write((KEYOUT) key, (VALUEOUT) value);
  }
}




                                                                                  24
(the shuffle)

map output is assigned to a “reducer”

map output is sorted by key




                                        25
reduce()
         K₂, iter(V₂)→list(K₃,V₃)

public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  /**
    * This method is called once for each key. Most applications will define
    * their reduce class by overriding this method. The default implementation
    * is an identity function.
    */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                         ) throws IOException, InterruptedException {
     for(VALUEIN value: values) {
       context.write((KEYOUT) key, (VALUEOUT) value);
     }
  }
}




                                                                                 26
Putting it together...
Logical


                   Physical Flow



Physical




                                    27
Some samples...
Build an inverted index.
Summarize data grouped by a key.
Build map tiles from geographic data.
OCRing many images.
Learning ML models. (e.g., Naive Bayes
for text classification)
Augment traditional BI/DW
technologies (by archiving raw data).
                                         28
There’s more than the
          Java API
Streaming             Pig                   Hive
perl, python,     Higher-level          SQL interface.
ruby, whatever.   dataflow language
                  for easy ad-hoc       Great for
stdin/stdout/     analysis.             analysts.
stderr
                  Developed at          Developed at
                  Yahoo!                Facebook

                    Many tasks actually require a series
                          of M/R jobs; that’s ok!
                                                           29
A Typical Look...
Commodity servers (8-core, 8-16GB
RAM, 4-12 TB, 2x1 gE NIC)
2-level network architecture
 20-40 nodes per rack




                                    30
So, how does it all
      work?




                      31
dramatis personae
Starring...

         NameNode (metadata server and database)

         SecondaryNameNode (assistant to NameNode)
         JobTracker (scheduler)

The Chorus…
           DataNodes                      TaskTrackers
         (block storage)                (task execution)


                                  Thanks to Zak Stone for earmuff image!
                                                                       32
HDFS
                          3x64MB file, 3 rep
   (fs metadata)
                          4x64MB file, 3 rep
    Namenode
                          Small file, 7 rep
Datanodes




        One Rack      A Different Rack        33
HDFS Write Path




                  34
HDFS Failures?
Datanode crash?
 Clients read another copy
 Background rebalance
Namenode crash?
 uh-oh



                             35
M/R
                                     Job on stars
Tasktrackers on the same            Different job
 machines as datanodes              Idle




         One Rack            A Different Rack       36
M/R




      37
M/R Failures
Task fails
  Try again?
  Try again somewhere else?
  Report failure
Retries possible because of idempotence


                                          38
Hadoop in the Wild
     (yes, it’s used in production)

Yahoo! Hadoop Clusters: > 82PB, >25k machines
(Eric14, HadoopWorld NYC ’09)

(M-R, but not Hadoop)
Google: 40 GB/s GFS read/write load (Jeff Dean,
LADIS ’09) [~3,500 TB/day]

Facebook: 4TB new data per day; DW: 4800 cores, 5.5
PB (Dhruba Borthakur, HadoopWorld)



                                                      39
The Hadoop Ecosystem
                            ETL Tools       BI Reporting       RDBMS

                          Pig (Data Flow)    Hive (SQL)         Sqoop
Zookeepr (Coordination)




                                                                        Avro (Serialization)
                          MapReduce (Job Scheduling/Execution System)

                          HBase (Key-Value store)


                                               HDFS
                                  (Hadoop Distributed File System)



                                                                                               40
Ok, fine, what next?
Get Hadoop!
 Cloudera’s Distribution for Hadoop
 http://hadoop.apache.org/
Try it out! (Locally, or on EC2)      Door
                                      Prize




                                              41
Just one slide...

Software: Cloudera Distribution for
Hadoop, Cloudera Desktop, more…
Training and certification…
Free on-line training materials
(including video)
Support & Professional Services
@cloudera, blog, etc.
                                      42
Okay, two slides


Talk to us if you’re going to do Hadoop
in your organization.




                                          43
Questions?

philip@cloudera.com

(feedback? yes!)

(hiring? yes!)




                      44
Backup Slides


  (i’ve got your back)




                         45
Important APIs
                                → is 1:many
           Input Format   data→K₁,V₁
                                               Writable
             Mapper       K₁,V₁→K₂,V₂
                                               JobClient
M/R Flow




                                                            Other
            Combiner      K₂,iter(V₂)→K₂,V₂

            Partitioner   K₂,V₂→int            *Context

             Reducer      K₂, iter(V₂)→K₃,V₃   Filesystem
           Out. Format K₃,V₃→data
                                                              46
public int run(String[] args)
throws Exception {                    grepJob.setReducerClass(LongSumRedu   FileOutputFormat.setOutputPath(sort
  if (args.length < 3) {              cer.class);                           Job, new Path(args[1]));
    System.out.println("Grep                                                    // sort by decreasing freq
<inDir> <outDir> <regex>
[<group>]");                          FileOutputFormat.setOutputPath(grep   sortJob.setOutputKeyComparatorClass
                                      Job, tempDir);                        (LongWritable.DecreasingComparator.
ToolRunner.printGenericCommandUsage                                         class);
(System.out);                         grepJob.setOutputFormat(SequenceFil
    return -1;                        eOutputFormat.class);                     JobClient.runJob(sortJob);
  }                                                                           } finally {
  Path tempDir = new Path("grep-      grepJob.setOutputKeyClass(Text.clas
temp-"+Integer.toString(new           s);                                   FileSystem.get(grepJob).delete(temp
Random().nextInt(Integer.MAX_VALUE)                                         Dir, true);
));                                   grepJob.setOutputValueClass(LongWri     }
  JobConf grepJob = new               table.class);                           return 0;
JobConf(getConf(), Grep.class);                                             }
  try {                                   JobClient.runJob(grepJob);
    grepJob.setJobName("grep-
search");                                 JobConf sortJob = new
                                      JobConf(Grep.class);
                                          sortJob.setJobName("grep-
FileInputFormat.setInputPaths(grepJ   sort");




                                                                            the “grep”
ob, args[0]);

                                      FileInputFormat.setInputPaths(sortJ
grepJob.setMapperClass(RegexMapper.   ob, tempDir);
class);




                                                                             example
                                      sortJob.setInputFormat(SequenceFile
grepJob.set("mapred.mapper.regex",    InputFormat.class);
args[2]);
    if (args.length == 4)
                                      sortJob.setMapperClass(InverseMappe
grepJob.set("mapred.mapper.regex.gr   r.class);
oup", args[3]);
                                          // write a single file
                                          sortJob.setNumReduceTasks(1);
grepJob.setCombinerClass(LongSumRed
ucer.class);




                                                                                                                  47
$ cat input.txt
adams dunster kirkland dunster
kirland dudley dunster
adams dunster winthrop

$ bin/hadoop jar hadoop-0.18.3-
examples.jar grep input.txt output1
'dunster|adams'

$ cat output1/part-00000
4 dunster
2 adams



                                      48
JobConf grepJob = new JobConf(getConf(), Grep.class);
try {
  grepJob.setJobName("grep-search");

  FileInputFormat.setInputPaths(grepJob, args[0]);




                                                          Job
  grepJob.setMapperClass(RegexMapper.class);
  grepJob.set("mapred.mapper.regex", args[2]);
  if (args.length == 4)
    grepJob.set("mapred.mapper.regex.group", args[3]);

  grepJob.setCombinerClass(LongSumReducer.class);
  grepJob.setReducerClass(LongSumReducer.class);
                                                         1of 2
  FileOutputFormat.setOutputPath(grepJob, tempDir);
  grepJob.setOutputFormat(SequenceFileOutputFormat.class);
  grepJob.setOutputKeyClass(Text.class);
  grepJob.setOutputValueClass(LongWritable.class);

  JobClient.runJob(grepJob);
} ...


                                                                 49
JobConf sortJob = new JobConf(Grep.class);
     sortJob.setJobName("grep-sort");

     FileInputFormat.setInputPaths(sortJob, tempDir);


                                                          Job
     sortJob.setInputFormat(SequenceFileInputFormat.class);

     sortJob.setMapperClass(InverseMapper.class);
                    (implicit identity reducer)
     // write a single file
     sortJob.setNumReduceTasks(1);                       2 of 2
     FileOutputFormat.setOutputPath(sortJob, new Path(args[1]));
     // sort by decreasing freq
     sortJob.setOutputKeyComparatorClass(
        LongWritable.DecreasingComparator.class);

      JobClient.runJob(sortJob);
    } finally {
      FileSystem.get(grepJob).delete(tempDir, true);
    }
    return 0;
}


                                                                   50
The types there...
    ?, Text

  Text, Long

Text, list(Long)

  Text, Long

  Long, Text

                           51
BI / Reporting                        Ad-hoc
                                        Queries
RDBMS (Aggregates)
                                         Data Mining
  ETL   (Extraction, Transform, Load)



Storage (Raw Data)
    Collection
  Instrumentation
    }


   Traditional DW
                                                       52
Facebook’s DW (phase M)
                  M 2008N
                      >
           Facebook Data Infrastructure
                                   Scribe Tier     MySQL Tier




                           Hadoop Tier




                              Oracle RAC Servers




Wednesday, April 1, 2009


                                                                53

More Related Content

What's hot

Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Cloudera, Inc.
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureKyong-Ha Lee
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopMohamed Elsaka
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceDr Ganesh Iyer
 
Graph Space Viewer
Graph Space ViewerGraph Space Viewer
Graph Space Viewerrydark
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-trainingGeohedrick
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & StorageIlayaraja P
 
White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction   White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction EMC
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 

What's hot (20)

Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Graph Space Viewer
Graph Space ViewerGraph Space Viewer
Graph Space Viewer
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
An Introduction to the World of Hadoop
An Introduction to the World of HadoopAn Introduction to the World of Hadoop
An Introduction to the World of Hadoop
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction   White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 

Viewers also liked

Competencia de un mercado al 100% de teledensidad v4
Competencia de un mercado al 100% de teledensidad v4Competencia de un mercado al 100% de teledensidad v4
Competencia de un mercado al 100% de teledensidad v4Johnny Gómez
 
My lecture eight on arab israel conflict-part I
My lecture eight on arab israel conflict-part IMy lecture eight on arab israel conflict-part I
My lecture eight on arab israel conflict-part IDr. Afroz Alam
 
Internship in-chennai-for-mca-in-cloud-computing
Internship in-chennai-for-mca-in-cloud-computingInternship in-chennai-for-mca-in-cloud-computing
Internship in-chennai-for-mca-in-cloud-computingroshneyarul
 
MICE destination island & icelandair
MICE destination island & icelandairMICE destination island & icelandair
MICE destination island & icelandairMICEboard
 
Pr Es En Ta Ci On De S An Va Le Nt In
Pr Es En Ta Ci On De S An Va Le Nt InPr Es En Ta Ci On De S An Va Le Nt In
Pr Es En Ta Ci On De S An Va Le Nt Inmaryelaedward
 
Thinakaran_resume
Thinakaran_resumeThinakaran_resume
Thinakaran_resumethina karan
 
Análisis comparativo entre la ISO9001 y la NICC1
Análisis comparativo entre la ISO9001 y la NICC1Análisis comparativo entre la ISO9001 y la NICC1
Análisis comparativo entre la ISO9001 y la NICC1Audinfor
 
GOOGLE APPS FOR EDUCATION TRAINING
GOOGLE APPS FOR EDUCATION TRAININGGOOGLE APPS FOR EDUCATION TRAINING
GOOGLE APPS FOR EDUCATION TRAININGOneNita Zuriah
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Data prevel oct2012
Data prevel  oct2012Data prevel  oct2012
Data prevel oct2012PabloBasica
 
Crearton 2010
Crearton 2010Crearton 2010
Crearton 2010CREARTON
 
Microinversores Enecsys_Incluye vídeo Guía de Instalación
Microinversores Enecsys_Incluye vídeo Guía de InstalaciónMicroinversores Enecsys_Incluye vídeo Guía de Instalación
Microinversores Enecsys_Incluye vídeo Guía de InstalaciónAlbasolar
 
C R Consulting Company Profile
C R Consulting   Company ProfileC R Consulting   Company Profile
C R Consulting Company Profileshaysh
 
E-coaching bij e-commerce (Indie Group op DMF11)
E-coaching bij e-commerce (Indie Group op DMF11)E-coaching bij e-commerce (Indie Group op DMF11)
E-coaching bij e-commerce (Indie Group op DMF11)CIS-COMMERCE.BE
 
Corporate Diwali Gifts by Crystal Hues ltd
Corporate Diwali Gifts by Crystal Hues ltdCorporate Diwali Gifts by Crystal Hues ltd
Corporate Diwali Gifts by Crystal Hues ltdCrystal Hues Ltd
 

Viewers also liked (20)

Competencia de un mercado al 100% de teledensidad v4
Competencia de un mercado al 100% de teledensidad v4Competencia de un mercado al 100% de teledensidad v4
Competencia de un mercado al 100% de teledensidad v4
 
My lecture eight on arab israel conflict-part I
My lecture eight on arab israel conflict-part IMy lecture eight on arab israel conflict-part I
My lecture eight on arab israel conflict-part I
 
Internship in-chennai-for-mca-in-cloud-computing
Internship in-chennai-for-mca-in-cloud-computingInternship in-chennai-for-mca-in-cloud-computing
Internship in-chennai-for-mca-in-cloud-computing
 
MICE destination island & icelandair
MICE destination island & icelandairMICE destination island & icelandair
MICE destination island & icelandair
 
Pr Es En Ta Ci On De S An Va Le Nt In
Pr Es En Ta Ci On De S An Va Le Nt InPr Es En Ta Ci On De S An Va Le Nt In
Pr Es En Ta Ci On De S An Va Le Nt In
 
ZARAGOZA
ZARAGOZAZARAGOZA
ZARAGOZA
 
Thinakaran_resume
Thinakaran_resumeThinakaran_resume
Thinakaran_resume
 
Real ch.2 a
Real ch.2 aReal ch.2 a
Real ch.2 a
 
EMPRENDER BIFFI
EMPRENDER BIFFIEMPRENDER BIFFI
EMPRENDER BIFFI
 
Análisis comparativo entre la ISO9001 y la NICC1
Análisis comparativo entre la ISO9001 y la NICC1Análisis comparativo entre la ISO9001 y la NICC1
Análisis comparativo entre la ISO9001 y la NICC1
 
GOOGLE APPS FOR EDUCATION TRAINING
GOOGLE APPS FOR EDUCATION TRAININGGOOGLE APPS FOR EDUCATION TRAINING
GOOGLE APPS FOR EDUCATION TRAINING
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Todas las carreras
Todas las carrerasTodas las carreras
Todas las carreras
 
Data prevel oct2012
Data prevel  oct2012Data prevel  oct2012
Data prevel oct2012
 
Crearton 2010
Crearton 2010Crearton 2010
Crearton 2010
 
Microinversores Enecsys_Incluye vídeo Guía de Instalación
Microinversores Enecsys_Incluye vídeo Guía de InstalaciónMicroinversores Enecsys_Incluye vídeo Guía de Instalación
Microinversores Enecsys_Incluye vídeo Guía de Instalación
 
C R Consulting Company Profile
C R Consulting   Company ProfileC R Consulting   Company Profile
C R Consulting Company Profile
 
Terminos internet
Terminos internetTerminos internet
Terminos internet
 
E-coaching bij e-commerce (Indie Group op DMF11)
E-coaching bij e-commerce (Indie Group op DMF11)E-coaching bij e-commerce (Indie Group op DMF11)
E-coaching bij e-commerce (Indie Group op DMF11)
 
Corporate Diwali Gifts by Crystal Hues ltd
Corporate Diwali Gifts by Crystal Hues ltdCorporate Diwali Gifts by Crystal Hues ltd
Corporate Diwali Gifts by Crystal Hues ltd
 

Similar to Apache Hadoop & Friends at Utah Java User's Group

EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseCloudera, Inc.
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introductionrajsandhu1989
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 

Similar to Apache Hadoop & Friends at Utah Java User's Group (20)

EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Apache drill
Apache drillApache drill
Apache drill
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Apache Hadoop & Friends at Utah Java User's Group

  • 1. Apache Hadoop & Friends Philip Zeyliger philip@cloudera.com @philz42 @cloudera February 18, 2010 1
  • 2. Hi there! Software Engineer Worked at And, oh, I ski (though mostly in Tahoe) 2
  • 3. @cloudera, I work on... Apache Avro Cloudera Desktop 3
  • 4. Outline Why should you care? (Intro) Challenging yesteryear’s assumptions The MapReduce Model HDFS, Hadoop Map/Reduce The Hadoop Ecosystem Questions 4
  • 5. Data is everywhere. Data is important. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. “I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” Hal Varian (Google’s chief economist) 9
  • 10. Are you throwing away data? Data comes in many shapes and sizes: relational tuples, log files, semistructured textual data (e.g., e-mail), … . Are you throwing it away because it doesn’t ‘fit’? 10
  • 11. So, what’s Hadoop? The Little Prince, Antoine de Saint-Exupéry, Irene Testot-Ferry 11
  • 12. Apache Hadoop is an open-source system (written in Java!) to reliably store and process gobs of data across many commodity computers. The Little Prince, Antoine de Saint-Exupéry, Irene Testot-Ferry 12
  • 13. Two Big Components HDFS Map/Reduce Self-healing high- bandwidth Fault-tolerant clustered storage. distributed computing. 13
  • 14. Challenging some of yesteryear’s assumptions... 14
  • 15. Assumption 1: Machines can be reliable... Image: MadMan the Mighty CC BY-NC-SA 15
  • 16. Hadoop Goal: Separate distributed system fault-tolerance code from application logic. Systems Programmers Statisticians 16
  • 17. Assumption 2: Machines have identities... Image:Laughing Squid CC BY- NC-SA 17
  • 18. Hadoop Goal: Users should interact with clusters, not machines. 18
  • 19. Assumption 3: A data set fits on one machine... Image: Matthew J. Stinson CC- BY-NC 19
  • 20. Hadoop Goal: System should scale linearly (or better) with data size. 20
  • 22. You specify map() and reduce() functions. The framework does the rest. 22
  • 23. fault-tolerance (that’s what’s important) (and that’s why Hadoop) 23
  • 24. map() map: K₁,V₁→list K₂,V₂ public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> { /** * Called once for each key/value pair in the input split. Most applications * should override this, but the default is the identity function. */ protected void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException { // context.write() can be called many times // this is default “identity mapper” implementation context.write((KEYOUT) key, (VALUEOUT) value); } } 24
  • 25. (the shuffle) map output is assigned to a “reducer” map output is sorted by key 25
  • 26. reduce() K₂, iter(V₂)→list(K₃,V₃) public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> { /** * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. */ @SuppressWarnings("unchecked") protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context ) throws IOException, InterruptedException { for(VALUEIN value: values) { context.write((KEYOUT) key, (VALUEOUT) value); } } } 26
  • 27. Putting it together... Logical Physical Flow Physical 27
  • 28. Some samples... Build an inverted index. Summarize data grouped by a key. Build map tiles from geographic data. OCRing many images. Learning ML models. (e.g., Naive Bayes for text classification) Augment traditional BI/DW technologies (by archiving raw data). 28
  • 29. There’s more than the Java API Streaming Pig Hive perl, python, Higher-level SQL interface. ruby, whatever. dataflow language for easy ad-hoc Great for stdin/stdout/ analysis. analysts. stderr Developed at Developed at Yahoo! Facebook Many tasks actually require a series of M/R jobs; that’s ok! 29
  • 30. A Typical Look... Commodity servers (8-core, 8-16GB RAM, 4-12 TB, 2x1 gE NIC) 2-level network architecture 20-40 nodes per rack 30
  • 31. So, how does it all work? 31
  • 32. dramatis personae Starring... NameNode (metadata server and database) SecondaryNameNode (assistant to NameNode) JobTracker (scheduler) The Chorus… DataNodes TaskTrackers (block storage) (task execution) Thanks to Zak Stone for earmuff image! 32
  • 33. HDFS 3x64MB file, 3 rep (fs metadata) 4x64MB file, 3 rep Namenode Small file, 7 rep Datanodes One Rack A Different Rack 33
  • 35. HDFS Failures? Datanode crash? Clients read another copy Background rebalance Namenode crash? uh-oh 35
  • 36. M/R Job on stars Tasktrackers on the same Different job machines as datanodes Idle One Rack A Different Rack 36
  • 37. M/R 37
  • 38. M/R Failures Task fails Try again? Try again somewhere else? Report failure Retries possible because of idempotence 38
  • 39. Hadoop in the Wild (yes, it’s used in production) Yahoo! Hadoop Clusters: > 82PB, >25k machines (Eric14, HadoopWorld NYC ’09) (M-R, but not Hadoop) Google: 40 GB/s GFS read/write load (Jeff Dean, LADIS ’09) [~3,500 TB/day] Facebook: 4TB new data per day; DW: 4800 cores, 5.5 PB (Dhruba Borthakur, HadoopWorld) 39
  • 40. The Hadoop Ecosystem ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop Zookeepr (Coordination) Avro (Serialization) MapReduce (Job Scheduling/Execution System) HBase (Key-Value store) HDFS (Hadoop Distributed File System) 40
  • 41. Ok, fine, what next? Get Hadoop! Cloudera’s Distribution for Hadoop http://hadoop.apache.org/ Try it out! (Locally, or on EC2) Door Prize 41
  • 42. Just one slide... Software: Cloudera Distribution for Hadoop, Cloudera Desktop, more… Training and certification… Free on-line training materials (including video) Support & Professional Services @cloudera, blog, etc. 42
  • 43. Okay, two slides Talk to us if you’re going to do Hadoop in your organization. 43
  • 45. Backup Slides (i’ve got your back) 45
  • 46. Important APIs → is 1:many Input Format data→K₁,V₁ Writable Mapper K₁,V₁→K₂,V₂ JobClient M/R Flow Other Combiner K₂,iter(V₂)→K₂,V₂ Partitioner K₂,V₂→int *Context Reducer K₂, iter(V₂)→K₃,V₃ Filesystem Out. Format K₃,V₃→data 46
  • 47. public int run(String[] args) throws Exception { grepJob.setReducerClass(LongSumRedu FileOutputFormat.setOutputPath(sort if (args.length < 3) { cer.class); Job, new Path(args[1])); System.out.println("Grep // sort by decreasing freq <inDir> <outDir> <regex> [<group>]"); FileOutputFormat.setOutputPath(grep sortJob.setOutputKeyComparatorClass Job, tempDir); (LongWritable.DecreasingComparator. ToolRunner.printGenericCommandUsage class); (System.out); grepJob.setOutputFormat(SequenceFil return -1; eOutputFormat.class); JobClient.runJob(sortJob); } } finally { Path tempDir = new Path("grep- grepJob.setOutputKeyClass(Text.clas temp-"+Integer.toString(new s); FileSystem.get(grepJob).delete(temp Random().nextInt(Integer.MAX_VALUE) Dir, true); )); grepJob.setOutputValueClass(LongWri } JobConf grepJob = new table.class); return 0; JobConf(getConf(), Grep.class); } try { JobClient.runJob(grepJob); grepJob.setJobName("grep- search"); JobConf sortJob = new JobConf(Grep.class); sortJob.setJobName("grep- FileInputFormat.setInputPaths(grepJ sort"); the “grep” ob, args[0]); FileInputFormat.setInputPaths(sortJ grepJob.setMapperClass(RegexMapper. ob, tempDir); class); example sortJob.setInputFormat(SequenceFile grepJob.set("mapred.mapper.regex", InputFormat.class); args[2]); if (args.length == 4) sortJob.setMapperClass(InverseMappe grepJob.set("mapred.mapper.regex.gr r.class); oup", args[3]); // write a single file sortJob.setNumReduceTasks(1); grepJob.setCombinerClass(LongSumRed ucer.class); 47
  • 48. $ cat input.txt adams dunster kirkland dunster kirland dudley dunster adams dunster winthrop $ bin/hadoop jar hadoop-0.18.3- examples.jar grep input.txt output1 'dunster|adams' $ cat output1/part-00000 4 dunster 2 adams 48
  • 49. JobConf grepJob = new JobConf(getConf(), Grep.class); try { grepJob.setJobName("grep-search"); FileInputFormat.setInputPaths(grepJob, args[0]); Job grepJob.setMapperClass(RegexMapper.class); grepJob.set("mapred.mapper.regex", args[2]); if (args.length == 4) grepJob.set("mapred.mapper.regex.group", args[3]); grepJob.setCombinerClass(LongSumReducer.class); grepJob.setReducerClass(LongSumReducer.class); 1of 2 FileOutputFormat.setOutputPath(grepJob, tempDir); grepJob.setOutputFormat(SequenceFileOutputFormat.class); grepJob.setOutputKeyClass(Text.class); grepJob.setOutputValueClass(LongWritable.class); JobClient.runJob(grepJob); } ... 49
  • 50. JobConf sortJob = new JobConf(Grep.class); sortJob.setJobName("grep-sort"); FileInputFormat.setInputPaths(sortJob, tempDir); Job sortJob.setInputFormat(SequenceFileInputFormat.class); sortJob.setMapperClass(InverseMapper.class); (implicit identity reducer) // write a single file sortJob.setNumReduceTasks(1); 2 of 2 FileOutputFormat.setOutputPath(sortJob, new Path(args[1])); // sort by decreasing freq sortJob.setOutputKeyComparatorClass( LongWritable.DecreasingComparator.class); JobClient.runJob(sortJob); } finally { FileSystem.get(grepJob).delete(tempDir, true); } return 0; } 50
  • 51. The types there... ?, Text Text, Long Text, list(Long) Text, Long Long, Text 51
  • 52. BI / Reporting Ad-hoc Queries RDBMS (Aggregates) Data Mining ETL (Extraction, Transform, Load) Storage (Raw Data) Collection Instrumentation } Traditional DW 52
  • 53. Facebook’s DW (phase M) M 2008N > Facebook Data Infrastructure Scribe Tier MySQL Tier Hadoop Tier Oracle RAC Servers Wednesday, April 1, 2009 53