SlideShare a Scribd company logo
1 of 50
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
HADOOP COMPONENTS
HADOOP CORE COMPONENTS
HADOOP ARCHITECTURE
www.edureka.co
WHAT IS HADOOP?
MAJOR HADOOP COMPONENTS
WHAT IS HADOOP?
www.edureka.co
www.edureka.co
WHAT IS HADOOP?
HADOOP
Hadoop is an open source distributed processing
framework that manages data processing and
storage for big data applications running in clustered
systems.
HADOOP CORE COMPONENTS
www.edureka.co
HADOOP CORE COMPONENTS
MAPREDUCE
COMMON UTILITIES
HDFS
YARN
www.edureka.co
HADOOP CORE COMPONENTS
NAMENODE RESOURCE MANAGER
SECONDARY
NAMENODE
DATANODE NODEMANAGER
HDFS YARN
Hadoop
MASTER
SLAVE
www.edureka.co
HADOOP ARCHITECTURE
www.edureka.co
HADOOP ARCHITECTURE
NAMENODE SECONDARY
NAMENODE
FS-image
Edit Log
Edit Log
(New)
FS-image
Edit Log
FS-image
(Final)
www.edureka.co
HADOOP CORE COMPONENTS
NODE
MANAGER
APP
MANAGER
CONTAINER
NODE
MANAGER
APP
MANAGER
CONTAINER
NODE
MANAGER
APP
MANAGER
CONTAINER
CLIENT RESOURCE MANAGER
Node Status
Resource Request
MapReduce Status
www.edureka.co
MAJOR HADOOP COMPONENTS
www.edureka.co
Storage Managers General Purpose
Execution Engines
Data abstraction
Engines
Machine Learning
Engines
Machine Learning
Engines
Database
Management
Engines
Resource
Management YARN
Storage HDFS
General Purpose
Execution
Engines
General Purpose
Execution
Engines
Hadoop Cluster
Management
Software
Graph Processing
Frameworks
Realtime Data
Streaming
Frameworks
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP
STORAGE MANAGERS
MAJOR HADOOP COMPONENTS
HDFS
• Hadoop Distributed File System.
• Primary Data Storage Unit in Hadoop.
• Used in Distributed Data Processing environment.
www.edureka.co
MAJOR HADOOP COMPONENTS
HCATALOG
• Hadoop Storage Management layer.
• Exposes Tabular data of Hive metastore to other
applications like Pig, MapReduce etc.
www.edureka.co
MAJOR HADOOP COMPONENTS
ZOOKEEPER
• Centralized Open-source Server
• Used to provide a distributed configuration
service, synchronization service, and naming
registry for large distributed systems.
www.edureka.co
MAJOR HADOOP COMPONENTS
OOZIE
• Server-based workflow scheduling system
• It Schedules jobs in Apache Hadoop Jobs
• Used to manage Directed Acyclical Graphs (DAGs)
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
GENERAL PURPOSE
EXECUTION ENGINES
MAJOR HADOOP COMPONENTS
MAPREDUCE
• Software Framework for distributed processing .
• It splits data into chunks to enable map, filter and
other operations.
• Used in Functional Programming.
www.edureka.co
MAJOR HADOOP COMPONENTS
SPARK
• General Purpose Cluster Computing Framework.
• It can perform Real-time data streaming and ETL
• Used for Micro-Batch Processing.
www.edureka.co
MAJOR HADOOP COMPONENTS
TEZ
• High performance Data processing tool.
• Executes series of MapReduce Jobs as single Job
• Used to Batch Processing environment
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP DATABASE
MANAGEMENT ENGINES
MAJOR HADOOP COMPONENTS
HIVE
• Data Warehouse Software Project
• Enables SQL like queries for Databases.
• Used in ETL, Hive DDL and DML
www.edureka.co
MAJOR HADOOP COMPONENTS
SPARK SQL
• Distributed SQL Query engine
• Enables Structured Data Processing.
• Used importing data from RDDs, Hive, Parquet
files etc.
www.edureka.co
MAJOR HADOOP COMPONENTS
IMPALA
• In-Memory Processing Query engine
• Integrates with HIVE metastore to share the table
information between the components.
• Used to process data in Hadoop Clusters
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE DRILL
• Low Latency Distributed Query engine
• Combines a variety of data stores just by using a
single query.
• Used to support different kinds of NoSQL Data
bases.
www.edureka.co
MAJOR HADOOP COMPONENTS
HBASE
• Open source, non-relational distributed database
• Combines a variety of data stores just by using a
single query.
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP DATA
ABSTRACTION ENGINES
MAJOR HADOOP COMPONENTS
APACHE PIG
• High level scripting language
• Enables users to write complex data
transformations
• Performs ETL and analyses huge Datasets.
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE SQOOP
• Command-line interface application for
transferring data between relational databases
and Hadoop.
• Data Ingesting tool.
• Enables to import and export structured data in
an enterprise level
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP REAL-TIME
STREAMING FRAMEWORKS
MAJOR HADOOP COMPONENTS
SPARK STREAMING
• Spark Streaming is an extension of the
core SparkAPI.
• Enables scalable, high-throughput, fault-
tolerant stream processing of live data streams
• Spark Streaming provides a high-level abstraction
called discretized stream for continuous data
streaming.
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE KAFKA
• Open-source stream-processing software
• Ingests and moves large amounts of data very
quickly.
• Uses publish and subscribe to streams of records.
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE FLUME
• Open-source Distributed and Reliable software
• Architecture is based on Streaming Data Flows
• Collecting, Aggregating and Moving large logs of
Data.
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP GRAPH
PROCESSING FRAMEWORK
MAJOR HADOOP COMPONENTS
APACHE GIRAPH
• Iterative graph processing framework.
• Utilizes Apache Hadoop's MapReduce
implementation to process graphs.
• Used to analyse social media data
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE GRAPHX
• GraphX is Apache Spark's API for graphs and
graph-parallel computation.
• Comparable performance to the fastest
specialized graph processing systems.
• Seamlessly work with both graphs and collections.
• Choose from a growing library of graph
algorithms.
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP MACHINE
LEARNING FRAMEWORKS
MAJOR HADOOP COMPONENTS
H2O
• H2O is open-source software for big-data analysis.
• H2O allows to fit thousands of potential models as
part of discovering patterns in data.
• H2O uses iterative methods that provide quick
answers using all of the client's data.
www.edureka.co
MAJOR HADOOP COMPONENTS
ORYX
• A generic lambda architecture tier, providing
batch/speed/serving layers.
• Oryx is designed with specialization for real-time
large scale machine learning
• End-to-End implementation of the standard ML
algorithms as applications.
www.edureka.co
MAJOR HADOOP COMPONENTS
SPARK MLlib
• Spark MLlib is a scalable Machine Learning
Library.
• It enables us to perform Machine Learning
operations in Spark.
www.edureka.co
MAJOR HADOOP COMPONENTS
AVRO
• Avro is a row-oriented remote procedure call and
data serialization.
• Used in Dynamic typing and Schema Evolution
and many more.
• Avro is used in Data Serialization and RPC.
www.edureka.co
MAJOR HADOOP COMPONENTS
THRIFT
• It is an Interface definition language and binary
communication protocol.
• It allows users to define data types and service
interfaces in a simple definition file
• Thrift is used in building RPC Clients and Servers.
www.edureka.co
MAJOR HADOOP COMPONENTS
MAHOUT
• Implementations of distributed machine learning
algorithms.
• Store and process big data in a distributed
environment across clusters of computers
using simple programming models
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP CLUSTER
MANAGEMENT SOFTWARE
www.edureka.co
MAJOR HADOOP COMPONENTS
AMBAARI
• Hadoop Cluster Management Software.
• Ambari enables system administrators to
provision, manage and monitor a Hadoop cluster.
www.edureka.co
MAJOR HADOOP COMPONENTS
ZOOKEEPER
• Centralized Open-source Server
• Manage configuration across nodes
• Implement reliable messaging
• Implement redundant services
• Synchronize process execution
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co

More Related Content

What's hot

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

What's hot (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Hadoop
Hadoop Hadoop
Hadoop
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Features of Hadoop
Features of HadoopFeatures of Hadoop
Features of Hadoop
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Similar to What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka

hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptx
mrudulasb
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 

Similar to What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka (20)

Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptx
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Bdm hadoop ecosystem
Bdm hadoop ecosystemBdm hadoop ecosystem
Bdm hadoop ecosystem
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 

More from Edureka!

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka

  • 1. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
  • 3. HADOOP CORE COMPONENTS HADOOP ARCHITECTURE www.edureka.co WHAT IS HADOOP? MAJOR HADOOP COMPONENTS
  • 5. www.edureka.co WHAT IS HADOOP? HADOOP Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.
  • 7. HADOOP CORE COMPONENTS MAPREDUCE COMMON UTILITIES HDFS YARN www.edureka.co
  • 8. HADOOP CORE COMPONENTS NAMENODE RESOURCE MANAGER SECONDARY NAMENODE DATANODE NODEMANAGER HDFS YARN Hadoop MASTER SLAVE www.edureka.co
  • 10. HADOOP ARCHITECTURE NAMENODE SECONDARY NAMENODE FS-image Edit Log Edit Log (New) FS-image Edit Log FS-image (Final) www.edureka.co
  • 13. Storage Managers General Purpose Execution Engines Data abstraction Engines Machine Learning Engines Machine Learning Engines Database Management Engines Resource Management YARN Storage HDFS General Purpose Execution Engines General Purpose Execution Engines Hadoop Cluster Management Software Graph Processing Frameworks Realtime Data Streaming Frameworks www.edureka.co
  • 14. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP STORAGE MANAGERS
  • 15. MAJOR HADOOP COMPONENTS HDFS • Hadoop Distributed File System. • Primary Data Storage Unit in Hadoop. • Used in Distributed Data Processing environment. www.edureka.co
  • 16. MAJOR HADOOP COMPONENTS HCATALOG • Hadoop Storage Management layer. • Exposes Tabular data of Hive metastore to other applications like Pig, MapReduce etc. www.edureka.co
  • 17. MAJOR HADOOP COMPONENTS ZOOKEEPER • Centralized Open-source Server • Used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems. www.edureka.co
  • 18. MAJOR HADOOP COMPONENTS OOZIE • Server-based workflow scheduling system • It Schedules jobs in Apache Hadoop Jobs • Used to manage Directed Acyclical Graphs (DAGs) www.edureka.co
  • 19. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co GENERAL PURPOSE EXECUTION ENGINES
  • 20. MAJOR HADOOP COMPONENTS MAPREDUCE • Software Framework for distributed processing . • It splits data into chunks to enable map, filter and other operations. • Used in Functional Programming. www.edureka.co
  • 21. MAJOR HADOOP COMPONENTS SPARK • General Purpose Cluster Computing Framework. • It can perform Real-time data streaming and ETL • Used for Micro-Batch Processing. www.edureka.co
  • 22. MAJOR HADOOP COMPONENTS TEZ • High performance Data processing tool. • Executes series of MapReduce Jobs as single Job • Used to Batch Processing environment www.edureka.co
  • 23. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP DATABASE MANAGEMENT ENGINES
  • 24. MAJOR HADOOP COMPONENTS HIVE • Data Warehouse Software Project • Enables SQL like queries for Databases. • Used in ETL, Hive DDL and DML www.edureka.co
  • 25. MAJOR HADOOP COMPONENTS SPARK SQL • Distributed SQL Query engine • Enables Structured Data Processing. • Used importing data from RDDs, Hive, Parquet files etc. www.edureka.co
  • 26. MAJOR HADOOP COMPONENTS IMPALA • In-Memory Processing Query engine • Integrates with HIVE metastore to share the table information between the components. • Used to process data in Hadoop Clusters www.edureka.co
  • 27. MAJOR HADOOP COMPONENTS APACHE DRILL • Low Latency Distributed Query engine • Combines a variety of data stores just by using a single query. • Used to support different kinds of NoSQL Data bases. www.edureka.co
  • 28. MAJOR HADOOP COMPONENTS HBASE • Open source, non-relational distributed database • Combines a variety of data stores just by using a single query. www.edureka.co
  • 29. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP DATA ABSTRACTION ENGINES
  • 30. MAJOR HADOOP COMPONENTS APACHE PIG • High level scripting language • Enables users to write complex data transformations • Performs ETL and analyses huge Datasets. www.edureka.co
  • 31. MAJOR HADOOP COMPONENTS APACHE SQOOP • Command-line interface application for transferring data between relational databases and Hadoop. • Data Ingesting tool. • Enables to import and export structured data in an enterprise level www.edureka.co
  • 32. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP REAL-TIME STREAMING FRAMEWORKS
  • 33. MAJOR HADOOP COMPONENTS SPARK STREAMING • Spark Streaming is an extension of the core SparkAPI. • Enables scalable, high-throughput, fault- tolerant stream processing of live data streams • Spark Streaming provides a high-level abstraction called discretized stream for continuous data streaming. www.edureka.co
  • 34. MAJOR HADOOP COMPONENTS APACHE KAFKA • Open-source stream-processing software • Ingests and moves large amounts of data very quickly. • Uses publish and subscribe to streams of records. www.edureka.co
  • 35. MAJOR HADOOP COMPONENTS APACHE FLUME • Open-source Distributed and Reliable software • Architecture is based on Streaming Data Flows • Collecting, Aggregating and Moving large logs of Data. www.edureka.co
  • 36. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP GRAPH PROCESSING FRAMEWORK
  • 37. MAJOR HADOOP COMPONENTS APACHE GIRAPH • Iterative graph processing framework. • Utilizes Apache Hadoop's MapReduce implementation to process graphs. • Used to analyse social media data www.edureka.co
  • 38. MAJOR HADOOP COMPONENTS APACHE GRAPHX • GraphX is Apache Spark's API for graphs and graph-parallel computation. • Comparable performance to the fastest specialized graph processing systems. • Seamlessly work with both graphs and collections. • Choose from a growing library of graph algorithms. www.edureka.co
  • 39. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP MACHINE LEARNING FRAMEWORKS
  • 40. MAJOR HADOOP COMPONENTS H2O • H2O is open-source software for big-data analysis. • H2O allows to fit thousands of potential models as part of discovering patterns in data. • H2O uses iterative methods that provide quick answers using all of the client's data. www.edureka.co
  • 41. MAJOR HADOOP COMPONENTS ORYX • A generic lambda architecture tier, providing batch/speed/serving layers. • Oryx is designed with specialization for real-time large scale machine learning • End-to-End implementation of the standard ML algorithms as applications. www.edureka.co
  • 42. MAJOR HADOOP COMPONENTS SPARK MLlib • Spark MLlib is a scalable Machine Learning Library. • It enables us to perform Machine Learning operations in Spark. www.edureka.co
  • 43. MAJOR HADOOP COMPONENTS AVRO • Avro is a row-oriented remote procedure call and data serialization. • Used in Dynamic typing and Schema Evolution and many more. • Avro is used in Data Serialization and RPC. www.edureka.co
  • 44. MAJOR HADOOP COMPONENTS THRIFT • It is an Interface definition language and binary communication protocol. • It allows users to define data types and service interfaces in a simple definition file • Thrift is used in building RPC Clients and Servers. www.edureka.co
  • 45. MAJOR HADOOP COMPONENTS MAHOUT • Implementations of distributed machine learning algorithms. • Store and process big data in a distributed environment across clusters of computers using simple programming models www.edureka.co
  • 46. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP CLUSTER MANAGEMENT SOFTWARE
  • 47. www.edureka.co MAJOR HADOOP COMPONENTS AMBAARI • Hadoop Cluster Management Software. • Ambari enables system administrators to provision, manage and monitor a Hadoop cluster. www.edureka.co
  • 48. MAJOR HADOOP COMPONENTS ZOOKEEPER • Centralized Open-source Server • Manage configuration across nodes • Implement reliable messaging • Implement redundant services • Synchronize process execution www.edureka.co
  • 49. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co