SlideShare uma empresa Scribd logo
1 de 13
APACHE HADOOP
All about it in a nutshell…………...
★ Hadoop was created by Doug Cutting and Mike Cafarella in 2005.
★ Cutting, who was working at Yahoo! at the time, named it after his son's toy
elephant.
★ It was originally developed to support distribution for the Nutch search
engine project
HISTORY
★ A batch processing framework of tools
★ These tools are to support running applications on Big Data
★ It is open source
★ Distributed under apache licence
★ Solves three big data problems; velocity, volume, variety
★ Traditional approach maps big data to a single high power machine --- single point of
failure/expensive
but Hadoop approach will harness the power of many low power computers into a powerful
one while creating redundancy in its distributed approach.
HADOOP DEFINED
★ MapReduce
○ task and job trackers
★ Filesystem: HDFS
○ name and data nodes
★ Projects:
○ SLAVES -- Computers with a data node and a task tracker
○ MASTER -- A computer with a datanode, tasktracker, name node and job tracker
○ Project tools:
■ Hive, Hbase, Mahout, Pig, Oozie, Flume, Scoop
ARCHITECTURE
★ Job tracker gets job :
○ distributes to task trackers on slaves
○ when job is done, it is assembled back to job tracker on master.
★ Name node indexes which data node has which data
○ for redundancy and fault tolerance, three copies of each data is
maintained on diff data nodes.
○ tables on name node are backed up and there is also a backup
master
HADOOP MAP-REDUCE ENGINE
★ On which data node the file is located
★ What if a node fails?
★ How to share tasks among data nodes
★ Scalability-1 to 1000 clusters
★ Scalability cost is linear. i.e. The bigger the cluster, the higher the
processing power (x = number of PCs, y = processing speed)
WHAT HADOOP DOESN'T WANT US TO BE
WORRIED ABOUT
★ Yahoo
★ Facebook
★ Amazon
★ Ebay
★ American airlines
★ The New York times
★ Federal reserve board
★ Chevron
★ IBM
★ Who’s next ?? DreamOval Products?? DreamOval business ?? could even be
you!
HADOOP BENEFICIARIES
★ Adverts: mining of users behaviour to generate
recommendation
★ Searches: group related documents
★ Security: search for uncommon patterns, AML, fraud
etc...
HADOOP APPLICATIONS
★ Admins: install, manage, maintain
★ Users: Designing applications, import and export data,
work with tools
….. EVERY DOer COULD BE A USER :)
HADOOP USERS
By 2015, 50% of enterprise data will be
processed by Hadoop...
YAHOO PREDICTING HADOOP'S FUTURE
1. Standalone mode: all hadoop daemons on one PC and on one Java Virtual
Machine process
2. Pseudo distributed mode: all hadoop daemons on one PC and on different
Java Virtual Machine processes
3. Fully distributed mode: all hadoop daemons on diff PCs and on different
Java Virtual Machine processes
HADOOP INSTALLATION TYPES
★ An ssh server for master slave communication
★ Java 6 or greater
★ Download and install hadoop
★ Add path to .bashrc
★ Configure hadoop environment
○ Edit env.sh-- Java_home TO RIGHT PATH, disable ipv6
○ Configure .xml (core-site- configure name nodes and ports; mapped-site- configure job
trackers and ports)
★ Launch Hadoop daemons
INSTALLATION OVERVIEW

Mais conteúdo relacionado

Mais procurados

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
Hortonworks
 

Mais procurados (20)

Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop - Introduction to Hadoop
Hadoop - Introduction to HadoopHadoop - Introduction to Hadoop
Hadoop - Introduction to Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
HDFS
HDFSHDFS
HDFS
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 

Semelhante a Hadoop overview

Semelhante a Hadoop overview (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop
HadoopHadoop
Hadoop
 
Data analytics
Data analyticsData analytics
Data analytics
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Hadoop foundation for analytics
Hadoop foundation for analyticsHadoop foundation for analytics
Hadoop foundation for analytics
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
MahoutNew
MahoutNewMahoutNew
MahoutNew
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Apache hadoop by shah
Apache hadoop by shahApache hadoop by shah
Apache hadoop by shah
 
Akhil's hadoop
Akhil's hadoopAkhil's hadoop
Akhil's hadoop
 
Akhil's hadoop
Akhil's hadoopAkhil's hadoop
Akhil's hadoop
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Hadoop
HadoopHadoop
Hadoop
 
Impala turbocharge your big data access
Impala   turbocharge your big data accessImpala   turbocharge your big data access
Impala turbocharge your big data access
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Hadoop overview

  • 1. APACHE HADOOP All about it in a nutshell…………...
  • 2. ★ Hadoop was created by Doug Cutting and Mike Cafarella in 2005. ★ Cutting, who was working at Yahoo! at the time, named it after his son's toy elephant. ★ It was originally developed to support distribution for the Nutch search engine project HISTORY
  • 3. ★ A batch processing framework of tools ★ These tools are to support running applications on Big Data ★ It is open source ★ Distributed under apache licence ★ Solves three big data problems; velocity, volume, variety ★ Traditional approach maps big data to a single high power machine --- single point of failure/expensive but Hadoop approach will harness the power of many low power computers into a powerful one while creating redundancy in its distributed approach. HADOOP DEFINED
  • 4. ★ MapReduce ○ task and job trackers ★ Filesystem: HDFS ○ name and data nodes ★ Projects: ○ SLAVES -- Computers with a data node and a task tracker ○ MASTER -- A computer with a datanode, tasktracker, name node and job tracker ○ Project tools: ■ Hive, Hbase, Mahout, Pig, Oozie, Flume, Scoop ARCHITECTURE
  • 5.
  • 6. ★ Job tracker gets job : ○ distributes to task trackers on slaves ○ when job is done, it is assembled back to job tracker on master. ★ Name node indexes which data node has which data ○ for redundancy and fault tolerance, three copies of each data is maintained on diff data nodes. ○ tables on name node are backed up and there is also a backup master HADOOP MAP-REDUCE ENGINE
  • 7. ★ On which data node the file is located ★ What if a node fails? ★ How to share tasks among data nodes ★ Scalability-1 to 1000 clusters ★ Scalability cost is linear. i.e. The bigger the cluster, the higher the processing power (x = number of PCs, y = processing speed) WHAT HADOOP DOESN'T WANT US TO BE WORRIED ABOUT
  • 8. ★ Yahoo ★ Facebook ★ Amazon ★ Ebay ★ American airlines ★ The New York times ★ Federal reserve board ★ Chevron ★ IBM ★ Who’s next ?? DreamOval Products?? DreamOval business ?? could even be you! HADOOP BENEFICIARIES
  • 9. ★ Adverts: mining of users behaviour to generate recommendation ★ Searches: group related documents ★ Security: search for uncommon patterns, AML, fraud etc... HADOOP APPLICATIONS
  • 10. ★ Admins: install, manage, maintain ★ Users: Designing applications, import and export data, work with tools ….. EVERY DOer COULD BE A USER :) HADOOP USERS
  • 11. By 2015, 50% of enterprise data will be processed by Hadoop... YAHOO PREDICTING HADOOP'S FUTURE
  • 12. 1. Standalone mode: all hadoop daemons on one PC and on one Java Virtual Machine process 2. Pseudo distributed mode: all hadoop daemons on one PC and on different Java Virtual Machine processes 3. Fully distributed mode: all hadoop daemons on diff PCs and on different Java Virtual Machine processes HADOOP INSTALLATION TYPES
  • 13. ★ An ssh server for master slave communication ★ Java 6 or greater ★ Download and install hadoop ★ Add path to .bashrc ★ Configure hadoop environment ○ Edit env.sh-- Java_home TO RIGHT PATH, disable ipv6 ○ Configure .xml (core-site- configure name nodes and ports; mapped-site- configure job trackers and ports) ★ Launch Hadoop daemons INSTALLATION OVERVIEW