SlideShare uma empresa Scribd logo
1 de 10
Hadoop
                   Reliably store and process
                      gobs of information
              across many commodity computers
                              Edited by Oded Rotter
                              oded1233@gmail.com
Based On:
http://www.cloudera.com/resource/apache-hadoop-introduction-glue-2010
http://www.cloudera.com/what-is-hadoop/
http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/




                           Image:Yahoo! Hadoop cluster
What is Hadoop ?
Hadoop is an open-source project administered by the
Apache Software Foundation.
Hadoop’s contributors work for some of the world’s biggest
technology companies. That diverse, motivated community
has produced a genuinely innovative platform for
consolidating, combining and understanding large-scale data
in order to better comprehend the data deluge.

Enterprises today collect and generate more data than ever
before. Relational and data warehouse products excel at
OLAP and OLTP workloads over structured data.
Hadoop, however, was designed to solve a different problem:
the fast, reliable analysis of both structured data and complex
data. As a result, many enterprises deploy Hadoop alongside
their legacy IT systems, which allows them to combine old
data and new data sets in powerful new ways.
Key Services
• Distributed File System (HDFS)
  Self-healing high-bandwidth clustered storage
• Map/Reduce
 High-performance parallel data processing
 Distributed computing
• Separation of distributed system fault-
 tolerance code from application logic
Infrastructure
• Runs on a collection of commodity/shared-nothing servers
• You can add or remove servers in a Hadoop cluster at will
• The system detects and compensates for hardware or system
  problems on any server- Self-healing
• It can deliver data — and can run large-scale, high-
  performance processing jobs — in spite of system changes or
  failures.
• Originally developed and employed by dominant Web
  companies like Yahoo and Facebook, Hadoop is now widely
  used in finance, technology, telecom, media and
  entertainment, government, research institutions and other
  markets with significant data. With Hadoop, enterprises can
  easily explore complex data using custom analyses tailored to
  their information and questions.
Key functions
•   NameNode (metadata server and database)
•   SecondaryNameNode (assistant to NameNode)
•   JobTracker (scheduler)
•   DataNodes (block storage)
•   TaskTrackers (task execution)
Now what ?
• Three major categories of machine roles in a Hadoop deployment are :
  Client machines
  Masters nodes
  Slave nodes.
• The Master nodes oversee the two key functional pieces that make up
  Hadoop: storing lots of data (HDFS), and running parallel computations on
  all that data (Map Reduce).
• The Name Node oversees and coordinates the data storage function
  (HDFS), while the Job Tracker oversees and coordinates the parallel
  processing of data using Map Reduce.
• Slave Nodes make up the vast majority of machines and do all the dirty
  work of storing the data and running the computations.
• Each slave runs both a Data Node and Task Tracker daemon that
  communicate with and receive instructions from their master nodes.
• The Task Tracker daemon is a slave to the Job Tracker, the Data Node
  daemon a slave to the Name Node.
And …
• Client machines have Hadoop installed with all the cluster
  settings, but are neither a Master or a Slave. Instead, the role of the
  Client machine is to load data into the cluster,submit Map Reduce
  jobs describing how that data should be processed, and then
  retrieve or view the results of the job when its finished.
• In smaller clusters (~40 nodes) you may have a single physical
  server playing multiple roles, such as both Job Tracker and Name
  Node.
• With medium to large clusters you will often have each role
  operating on a single server machine.
• In real production clusters -no server virtualization- no hypervisor
  ( unnecessary overhead impeding performance)
• Hadoop runs best on Linux machines, working directly with the
  underlying hardware.
The Hadoop Ecosystem
Real life examples (2010)
• Yahoo! Hadoop Clusters: > 82PB, >25k machines
 (Eric14, HadoopWorld NYC ’09)
• Facebook: 15TB new data per day;10000+ cores, 12+ PB
• Twitter: ~1TB per day, ~80 nodes
• Lots of 5-40 node clusters at companies without PB’s
 of data (web, retail, finance, telecom, research)

Mais conteúdo relacionado

Mais procurados

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Mais procurados (20)

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Semelhante a Reliably store and process gobs of information across many commodity computers

Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 

Semelhante a Reliably store and process gobs of information across many commodity computers (20)

Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big data
Big dataBig data
Big data
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 

Mais de Oded Rotter

שלמה המלך.pdf
שלמה המלך.pdfשלמה המלך.pdf
שלמה המלך.pdfOded Rotter
 
ארון הברית.pdf
ארון הברית.pdfארון הברית.pdf
ארון הברית.pdfOded Rotter
 
הסיפור המקראי.pdf
הסיפור המקראי.pdfהסיפור המקראי.pdf
הסיפור המקראי.pdfOded Rotter
 
the Golden calf.pdf
the Golden calf.pdfthe Golden calf.pdf
the Golden calf.pdfOded Rotter
 
exodus from Egypt.pdf
exodus from Egypt.pdfexodus from Egypt.pdf
exodus from Egypt.pdfOded Rotter
 
תפילת חנה.pdf
תפילת חנה.pdfתפילת חנה.pdf
תפילת חנה.pdfOded Rotter
 
עליות משה להר סיני.pdf
עליות משה להר סיני.pdfעליות משה להר סיני.pdf
עליות משה להר סיני.pdfOded Rotter
 
ניהול אנרגיה בארגונים עסקיים.pdf
ניהול אנרגיה בארגונים עסקיים.pdfניהול אנרגיה בארגונים עסקיים.pdf
ניהול אנרגיה בארגונים עסקיים.pdfOded Rotter
 
חיפוש עבודה באינטרנט.pdf
חיפוש עבודה באינטרנט.pdfחיפוש עבודה באינטרנט.pdf
חיפוש עבודה באינטרנט.pdfOded Rotter
 
Sdn dell lab report v2
Sdn dell lab report v2Sdn dell lab report v2
Sdn dell lab report v2Oded Rotter
 
Container networking
Container networkingContainer networking
Container networkingOded Rotter
 

Mais de Oded Rotter (20)

יהוה.pdf
יהוה.pdfיהוה.pdf
יהוה.pdf
 
שלמה המלך.pdf
שלמה המלך.pdfשלמה המלך.pdf
שלמה המלך.pdf
 
ארון הברית.pdf
ארון הברית.pdfארון הברית.pdf
ארון הברית.pdf
 
הסיפור המקראי.pdf
הסיפור המקראי.pdfהסיפור המקראי.pdf
הסיפור המקראי.pdf
 
the Golden calf.pdf
the Golden calf.pdfthe Golden calf.pdf
the Golden calf.pdf
 
exodus from Egypt.pdf
exodus from Egypt.pdfexodus from Egypt.pdf
exodus from Egypt.pdf
 
תפילת חנה.pdf
תפילת חנה.pdfתפילת חנה.pdf
תפילת חנה.pdf
 
עליות משה להר סיני.pdf
עליות משה להר סיני.pdfעליות משה להר סיני.pdf
עליות משה להר סיני.pdf
 
ניהול אנרגיה בארגונים עסקיים.pdf
ניהול אנרגיה בארגונים עסקיים.pdfניהול אנרגיה בארגונים עסקיים.pdf
ניהול אנרגיה בארגונים עסקיים.pdf
 
חיפוש עבודה באינטרנט.pdf
חיפוש עבודה באינטרנט.pdfחיפוש עבודה באינטרנט.pdf
חיפוש עבודה באינטרנט.pdf
 
Serverless
ServerlessServerless
Serverless
 
BBR
BBRBBR
BBR
 
NB-IoT
NB-IoTNB-IoT
NB-IoT
 
Lo ra
Lo raLo ra
Lo ra
 
Tls 1 3
Tls 1 3Tls 1 3
Tls 1 3
 
Bufferbloat
BufferbloatBufferbloat
Bufferbloat
 
Sdn dell lab report v2
Sdn dell lab report v2Sdn dell lab report v2
Sdn dell lab report v2
 
SDDC
SDDCSDDC
SDDC
 
Container networking
Container networkingContainer networking
Container networking
 
Container
ContainerContainer
Container
 

Último

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Último (20)

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Reliably store and process gobs of information across many commodity computers

  • 1. Hadoop Reliably store and process gobs of information across many commodity computers Edited by Oded Rotter oded1233@gmail.com Based On: http://www.cloudera.com/resource/apache-hadoop-introduction-glue-2010 http://www.cloudera.com/what-is-hadoop/ http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/ Image:Yahoo! Hadoop cluster
  • 2. What is Hadoop ? Hadoop is an open-source project administered by the Apache Software Foundation. Hadoop’s contributors work for some of the world’s biggest technology companies. That diverse, motivated community has produced a genuinely innovative platform for consolidating, combining and understanding large-scale data in order to better comprehend the data deluge. Enterprises today collect and generate more data than ever before. Relational and data warehouse products excel at OLAP and OLTP workloads over structured data. Hadoop, however, was designed to solve a different problem: the fast, reliable analysis of both structured data and complex data. As a result, many enterprises deploy Hadoop alongside their legacy IT systems, which allows them to combine old data and new data sets in powerful new ways.
  • 3. Key Services • Distributed File System (HDFS) Self-healing high-bandwidth clustered storage • Map/Reduce High-performance parallel data processing Distributed computing • Separation of distributed system fault- tolerance code from application logic
  • 4. Infrastructure • Runs on a collection of commodity/shared-nothing servers • You can add or remove servers in a Hadoop cluster at will • The system detects and compensates for hardware or system problems on any server- Self-healing • It can deliver data — and can run large-scale, high- performance processing jobs — in spite of system changes or failures. • Originally developed and employed by dominant Web companies like Yahoo and Facebook, Hadoop is now widely used in finance, technology, telecom, media and entertainment, government, research institutions and other markets with significant data. With Hadoop, enterprises can easily explore complex data using custom analyses tailored to their information and questions.
  • 5. Key functions • NameNode (metadata server and database) • SecondaryNameNode (assistant to NameNode) • JobTracker (scheduler) • DataNodes (block storage) • TaskTrackers (task execution)
  • 6.
  • 7. Now what ? • Three major categories of machine roles in a Hadoop deployment are : Client machines Masters nodes Slave nodes. • The Master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data (HDFS), and running parallel computations on all that data (Map Reduce). • The Name Node oversees and coordinates the data storage function (HDFS), while the Job Tracker oversees and coordinates the parallel processing of data using Map Reduce. • Slave Nodes make up the vast majority of machines and do all the dirty work of storing the data and running the computations. • Each slave runs both a Data Node and Task Tracker daemon that communicate with and receive instructions from their master nodes. • The Task Tracker daemon is a slave to the Job Tracker, the Data Node daemon a slave to the Name Node.
  • 8. And … • Client machines have Hadoop installed with all the cluster settings, but are neither a Master or a Slave. Instead, the role of the Client machine is to load data into the cluster,submit Map Reduce jobs describing how that data should be processed, and then retrieve or view the results of the job when its finished. • In smaller clusters (~40 nodes) you may have a single physical server playing multiple roles, such as both Job Tracker and Name Node. • With medium to large clusters you will often have each role operating on a single server machine. • In real production clusters -no server virtualization- no hypervisor ( unnecessary overhead impeding performance) • Hadoop runs best on Linux machines, working directly with the underlying hardware.
  • 10. Real life examples (2010) • Yahoo! Hadoop Clusters: > 82PB, >25k machines (Eric14, HadoopWorld NYC ’09) • Facebook: 15TB new data per day;10000+ cores, 12+ PB • Twitter: ~1TB per day, ~80 nodes • Lots of 5-40 node clusters at companies without PB’s of data (web, retail, finance, telecom, research)