SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
© SpringPeople Software Private Limited, All Rights Reserved.© SpringPeople Software Private Limited, All Rights Reserved.
Introduction to Apache Hadoop
© SpringPeople Software Private Limited, All Rights Reserved.
What is Hadoop?
• Hadoop is an open source project by Apache Software
Foundation
• Originally based on papers published by Google in 2003 and
2004
• A well known platform for distributed applications
• Easy to scale-out
• Works well with commodity hard wares(not entirely true)
• Very good for background applications
© SpringPeople Software Private Limited, All Rights Reserved.
Hadoop Architecture
Two Primary components
• Distributed File System (HDFS): It deals with file operations
like read, write, delete & etc
• Map Reduce Engine: It deals with parallel computation
© SpringPeople Software Private Limited, All Rights Reserved.
Hadoop Components
There are many other projects based around core Hadoop
• Often referred to as a ‘Hadoop Ecosystem’
• Pig, Hive, HBase, Flume, Oozie, Sqoop, etc
A set of machines running HDFS and MapReduce is known
as a Hadoop Cluster
• Individual machines are known as nodes
• A cluster can have as few as one node, as many as several thousands
• More nodes = better performance!
© SpringPeople Software Private Limited, All Rights Reserved.
Hadoop Server Roles
© SpringPeople Software Private Limited, All Rights Reserved.
Hadoop Cluster
© SpringPeople Software Private Limited, All Rights Reserved.
HDFS (Hadoop Distributed File System)
• HDFS is responsible for storing data on the cluster
• Sits on top of a native file system
• Designed to handle very large files
• Data is split into pre-defined equaled blocks and distributed across
multiple nodes in the cluster
• Each block is replicated multiple times
• Provides redundant storage for massive amounts of data
• HDFS performs best with a ‘modest’ number of large files
• Files in HDFS are ‘write once’
• HDFS is optimized for large, streaming reads of files
© SpringPeople Software Private Limited, All Rights Reserved.
Writing Files to HDFS
© SpringPeople Software Private Limited, All Rights Reserved.
MapReduce
• MapReduce is the system used to process data in the Hadoop
cluster
• Consists of two phases: Map, and then Reduce
• Each Map task operates on a discrete portion of the overall
dataset
• After all Maps are complete, the MapReduce system
distributes the intermediate data to nodes which perform the
Reduce phase
• Between the two is a stage known as the shuffle and sort
© SpringPeople Software Private Limited, All Rights Reserved.
Features of MapReduce
• Automatic parallelization and distribution
• Fault-tolerance
• Status and monitoring tools
• A clean abstraction for programmers
– MapReduce programs are usually written in Java
– Can be written in any language using Hadoop Streaming
– All of Hadoop is written in Java
• MapReduce abstracts all the ‘housekeeping’ away from the developer
– Developer can concentrate simply on writing the Map and Reduce
functions
© SpringPeople Software Private Limited, All Rights Reserved.
MapReduce: The Big Picture
© SpringPeople Software Private Limited, All Rights Reserved.
Data Processing: Map
© SpringPeople Software Private Limited, All Rights Reserved.
Data Processing: Reduce
© SpringPeople Software Private Limited, All Rights Reserved.
Installing a Hadoop Cluster
• Cluster installation is usually performed by the system administrator
• It is very useful to understand how the component parts of the Hadoop cluster
work together
• Typically, a developer will configure their machine to run in pseudo-distributed
mode
• Easiest way to download and install Hadoop, either for a full cluster or in pseudo-
distributed mode, is by using Cloudera’s Distribution including Apache Hadoop
(CDH)
© SpringPeople Software Private Limited, All Rights Reserved.
Basic Cluster Configuration
© SpringPeople Software Private Limited, All Rights Reserved.
The 5 Hadoop Daemons
Hadoop is comprised of five separate daemons
• NameNode
– Holds the metadata for HDFS
• Secondary NameNode
– Performs housekeeping functions for the NameNode
– Is not a backup or hot standby for the NameNode!
• DataNode
– Stores actual HDFS data blocks
• JobTracker
– Manages MapReduce jobs
• TaskTracker
– Instantiates and monitors individual Map and Reduce tasks
© SpringPeople Software Private Limited, All Rights Reserved.
Other Hadoop Ecosystem Projects: Hive
• Hive is an abstraction on top of the MapReduce
• Allows users to query data in the Hadoop cluster without knowing Java or
MapReduce
• Uses the HiveQL language
– Very similar to SQL
• The Hive Interpreter runs on a client machine
– Turns HiveQL queries into MapReduce jobs
– Submits those jobs to the cluster
• Note: this does not turn the cluster into a relational database server!
– It is still simply running Map Reduce jobs
– Those jobs are created by the Hive Interpreter
© SpringPeople Software Private Limited, All Rights Reserved.
Pig
• Pig is an alternative abstraction on top of MapReduce
• Uses a dataflow scripting language
– Called PigLatin
• The Pig interpreter runs on the client machine
– Takes the PigLatin script and turns it into a series of MapReduce jobs
– Submits those jobs to the cluster
• As with Hive, nothing ‘magical’ happens on the cluster
– It is still simply running Map Reduce jobs
© SpringPeople Software Private Limited, All Rights Reserved.
Flume & Sqoop
• Flume provides a method to import data into HDFS as it is generated
– Rather than batch-processing the data later
– For example, log files from a Web server
• Sqoop provides a method to import data from tables in a relational
database into HDFS
– Does this very efficiently via a Map-only MR job
– Can also go the other way – Populate database tables from files in
HDFS
© SpringPeople Software Private Limited, All Rights Reserved.
Oozie
• Oozie allows developers to create a workflow of MapReduce jobs –
Including dependencies between jobs
• The Oozie server submits the jobs to the server in the correct sequence
© SpringPeople Software Private Limited, All Rights Reserved.
Become a Hadoop Expert In 3
Days Flat
Attend the 3-Days “Apache Hadoop Workshop”
View Complete Details
© SpringPeople Software Private Limited, All Rights Reserved.
Who will benefit?
Experienced developers who wish to write, maintain and/or optimize Apache
Hadoop jobs
View Complete Details
© SpringPeople Software Private Limited, All Rights Reserved.
Q & A
training@springpeople.com
+91 80 65679700
www.springpeople.com
SpringSource & MuleSoft Certified Partner
VMware Authorized Training Center

Mais conteúdo relacionado

Mais procurados

Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for HadoopHadoop User Group
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopCloudera, Inc.
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ ZooskCloudera, Inc.
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsairisData
 
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...In-Memory Computing Summit
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafkaSzehon Ho
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...DataWorks Summit
 

Mais procurados (20)

Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for Hadoop
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
 
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
 

Semelhante a SpringPeople Introduction to Apache Hadoop

Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 

Semelhante a SpringPeople Introduction to Apache Hadoop (20)

Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Hadoop
HadoopHadoop
Hadoop
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Unit 5
Unit  5Unit  5
Unit 5
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 

Mais de SpringPeople

Growth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can tryGrowth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can trySpringPeople
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Introduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaSIntroduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaSSpringPeople
 
Introduction to Selenium WebDriver
Introduction to Selenium WebDriverIntroduction to Selenium WebDriver
Introduction to Selenium WebDriverSpringPeople
 
Introduction to Open stack - An Overview
Introduction to Open stack - An Overview Introduction to Open stack - An Overview
Introduction to Open stack - An Overview SpringPeople
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Why 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoftWhy 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoftSpringPeople
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsSpringPeople
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullySpringPeople
 
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...SpringPeople
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?SpringPeople
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Introduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeopleIntroduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeopleSpringPeople
 
Introduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeopleIntroduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeopleSpringPeople
 
Introduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeopleIntroduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeopleSpringPeople
 
Introduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeopleIntroduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeopleSpringPeople
 
Introduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeopleIntroduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeopleSpringPeople
 

Mais de SpringPeople (20)

Growth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can tryGrowth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can try
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaSIntroduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaS
 
Introduction to Selenium WebDriver
Introduction to Selenium WebDriverIntroduction to Selenium WebDriver
Introduction to Selenium WebDriver
 
Introduction to Open stack - An Overview
Introduction to Open stack - An Overview Introduction to Open stack - An Overview
Introduction to Open stack - An Overview
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Why 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoftWhy 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoft
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium Successfully
 
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Introduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeopleIntroduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeople
 
Introduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeopleIntroduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeople
 
Introduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeopleIntroduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeople
 
Introduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeopleIntroduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeople
 
Introduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeopleIntroduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeople
 

Último

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 

Último (20)

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 

SpringPeople Introduction to Apache Hadoop

  • 1. © SpringPeople Software Private Limited, All Rights Reserved.© SpringPeople Software Private Limited, All Rights Reserved. Introduction to Apache Hadoop
  • 2. © SpringPeople Software Private Limited, All Rights Reserved. What is Hadoop? • Hadoop is an open source project by Apache Software Foundation • Originally based on papers published by Google in 2003 and 2004 • A well known platform for distributed applications • Easy to scale-out • Works well with commodity hard wares(not entirely true) • Very good for background applications
  • 3. © SpringPeople Software Private Limited, All Rights Reserved. Hadoop Architecture Two Primary components • Distributed File System (HDFS): It deals with file operations like read, write, delete & etc • Map Reduce Engine: It deals with parallel computation
  • 4. © SpringPeople Software Private Limited, All Rights Reserved. Hadoop Components There are many other projects based around core Hadoop • Often referred to as a ‘Hadoop Ecosystem’ • Pig, Hive, HBase, Flume, Oozie, Sqoop, etc A set of machines running HDFS and MapReduce is known as a Hadoop Cluster • Individual machines are known as nodes • A cluster can have as few as one node, as many as several thousands • More nodes = better performance!
  • 5. © SpringPeople Software Private Limited, All Rights Reserved. Hadoop Server Roles
  • 6. © SpringPeople Software Private Limited, All Rights Reserved. Hadoop Cluster
  • 7. © SpringPeople Software Private Limited, All Rights Reserved. HDFS (Hadoop Distributed File System) • HDFS is responsible for storing data on the cluster • Sits on top of a native file system • Designed to handle very large files • Data is split into pre-defined equaled blocks and distributed across multiple nodes in the cluster • Each block is replicated multiple times • Provides redundant storage for massive amounts of data • HDFS performs best with a ‘modest’ number of large files • Files in HDFS are ‘write once’ • HDFS is optimized for large, streaming reads of files
  • 8. © SpringPeople Software Private Limited, All Rights Reserved. Writing Files to HDFS
  • 9. © SpringPeople Software Private Limited, All Rights Reserved. MapReduce • MapReduce is the system used to process data in the Hadoop cluster • Consists of two phases: Map, and then Reduce • Each Map task operates on a discrete portion of the overall dataset • After all Maps are complete, the MapReduce system distributes the intermediate data to nodes which perform the Reduce phase • Between the two is a stage known as the shuffle and sort
  • 10. © SpringPeople Software Private Limited, All Rights Reserved. Features of MapReduce • Automatic parallelization and distribution • Fault-tolerance • Status and monitoring tools • A clean abstraction for programmers – MapReduce programs are usually written in Java – Can be written in any language using Hadoop Streaming – All of Hadoop is written in Java • MapReduce abstracts all the ‘housekeeping’ away from the developer – Developer can concentrate simply on writing the Map and Reduce functions
  • 11. © SpringPeople Software Private Limited, All Rights Reserved. MapReduce: The Big Picture
  • 12. © SpringPeople Software Private Limited, All Rights Reserved. Data Processing: Map
  • 13. © SpringPeople Software Private Limited, All Rights Reserved. Data Processing: Reduce
  • 14. © SpringPeople Software Private Limited, All Rights Reserved. Installing a Hadoop Cluster • Cluster installation is usually performed by the system administrator • It is very useful to understand how the component parts of the Hadoop cluster work together • Typically, a developer will configure their machine to run in pseudo-distributed mode • Easiest way to download and install Hadoop, either for a full cluster or in pseudo- distributed mode, is by using Cloudera’s Distribution including Apache Hadoop (CDH)
  • 15. © SpringPeople Software Private Limited, All Rights Reserved. Basic Cluster Configuration
  • 16. © SpringPeople Software Private Limited, All Rights Reserved. The 5 Hadoop Daemons Hadoop is comprised of five separate daemons • NameNode – Holds the metadata for HDFS • Secondary NameNode – Performs housekeeping functions for the NameNode – Is not a backup or hot standby for the NameNode! • DataNode – Stores actual HDFS data blocks • JobTracker – Manages MapReduce jobs • TaskTracker – Instantiates and monitors individual Map and Reduce tasks
  • 17. © SpringPeople Software Private Limited, All Rights Reserved. Other Hadoop Ecosystem Projects: Hive • Hive is an abstraction on top of the MapReduce • Allows users to query data in the Hadoop cluster without knowing Java or MapReduce • Uses the HiveQL language – Very similar to SQL • The Hive Interpreter runs on a client machine – Turns HiveQL queries into MapReduce jobs – Submits those jobs to the cluster • Note: this does not turn the cluster into a relational database server! – It is still simply running Map Reduce jobs – Those jobs are created by the Hive Interpreter
  • 18. © SpringPeople Software Private Limited, All Rights Reserved. Pig • Pig is an alternative abstraction on top of MapReduce • Uses a dataflow scripting language – Called PigLatin • The Pig interpreter runs on the client machine – Takes the PigLatin script and turns it into a series of MapReduce jobs – Submits those jobs to the cluster • As with Hive, nothing ‘magical’ happens on the cluster – It is still simply running Map Reduce jobs
  • 19. © SpringPeople Software Private Limited, All Rights Reserved. Flume & Sqoop • Flume provides a method to import data into HDFS as it is generated – Rather than batch-processing the data later – For example, log files from a Web server • Sqoop provides a method to import data from tables in a relational database into HDFS – Does this very efficiently via a Map-only MR job – Can also go the other way – Populate database tables from files in HDFS
  • 20. © SpringPeople Software Private Limited, All Rights Reserved. Oozie • Oozie allows developers to create a workflow of MapReduce jobs – Including dependencies between jobs • The Oozie server submits the jobs to the server in the correct sequence
  • 21. © SpringPeople Software Private Limited, All Rights Reserved. Become a Hadoop Expert In 3 Days Flat Attend the 3-Days “Apache Hadoop Workshop” View Complete Details
  • 22. © SpringPeople Software Private Limited, All Rights Reserved. Who will benefit? Experienced developers who wish to write, maintain and/or optimize Apache Hadoop jobs View Complete Details
  • 23. © SpringPeople Software Private Limited, All Rights Reserved. Q & A training@springpeople.com +91 80 65679700 www.springpeople.com SpringSource & MuleSoft Certified Partner VMware Authorized Training Center