SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Hadoop
    Hadoop         (HDFS)



     




Public 2009/5/13
• Hadoop
• Hadoop   (HDFS)
    –
    –
    –
•




                    Copyright 2009 - Trend Micro Inc.
Hadoop               ?

• Hadoop

• Apache top-level                                  Cloud Applications

• Hadoop
    –                (HDFS)                   MapReduce                  HBase

    – MapReduce
•       Java                                  Hadoop Distributed File System
                                                        (HDFS)
•                 C++/Java/Shell/
    Command…
                                                  A Cluster of Machines
•
    – Linux    Mac OS/X Windows     Solaris
    –


                                                          Copyright 2009 - Trend Micro Inc.
Hadoop

• 2003   2
  – Google           MapReduce
• 2003   10
  – Google     Goofle File System (GFS)
• 2004   12
  – Google     MapReduce
• 2005   7
  – Doug Cutting     Nutch                MapReduce
• 2006   2
  – Hadoop          Nutch            Lucene
• 2006   11
  – Google     Bigtable



                                                      Copyright 2009 - Trend Micro Inc.
Hadoop

• 2007   2
  – Mike Cafarella        Hbase
• 2007   4
  – Yahoo!    1000                Hadoop
• 2008   1
  – Hadoop       Apache




                                           Copyright 2009 - Trend Micro Inc.
Who use Hadoop?
•   Yahoo!
    – Hadoop          2              CPU        10
•   Google
    –                 Hadoop
•   Amazon
    – Amazon          Hadoop
    –
•   IBM
    – Blue Cloud
•   Trend Micro
    –        Hadoop

•             Hadoop           …
    – http://wiki.apache.org/hadoop/PoweredBy



                                                     Copyright 2009 - Trend Micro Inc.
Hadoop   (HDFS)




              Copyright 2009 - Trend Micro Inc.
HDFS

•                                                 (Single
    Namespace)
•
    – 1          1             10 Peta Bytes
•
    – Write-once-read-many
    –
•                              (block)
    –                        128 MB
    –                                 (replica)
            (DataNode)




                                                        Copyright 2009 - Trend Micro Inc.
HDFS

•
    –

•       (File replication)
    –                3   .
    –
•
    –
    –
•
    –                         (low latency)

    –    (Batch processing)

                                              Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
(NameNode)

• NameNode           HDFS                (File System
  Namespace)
   –                  (blocks)
   –         (block)             Data Node
• Hadoop cluster
•




                                                  Copyright 2009 - Trend Micro Inc.
NameNode                              (Metadata)

•   Name node         Metadata

     –         Metadata

     –

•   Metadata
     –              (files)
     –                   (blocks)

     –       (block)
             (Data Node)
     –
         •      :            (creation time),
                       (replication factor)



                                                       Copyright 2009 - Trend Micro Inc.
NameNode                             (Metadata)
•             (      EditLog)
    –

•   FsImage
    – Name Node

         •                (Name Space)
         •        (Block)     (File)

         •
    – NameNode
      FsImage  EditLog


•   Checkpoint
    –     NameNode
    –           FsImange
        EditLog    EditLog
                          FsImange



                                                      Copyright 2009 - Trend Micro Inc.
(Secondary NameNode)

•    NameNode        FsImage     EditLog        NameNode

•    FSImage   EditLog                           FSImage
•        FSImage       NameNode
    – NameNode        EditLog
• Secondary NameNode            NameNode           (Fail over)
    – Hadoop              Name Node


          FsImage
                                      FsImage
                                       (new)

          EditLog



                                                           Copyright 2009 - Trend Micro Inc.
NameNode

•   NameNode          SPOF (single point of failure)
•              (High Availablity)


               SPOF!!




                                                Copyright 2009 - Trend Micro Inc.
(DataNode)

•                    (Blocks)

    –                     (     ext3)

    –        block   metadata
        •               (CRC), block

    –
•   Block
    –            Blocks
      NameNode
    –   NameNode
      block
            NameNode
      block



                                        Copyright 2009 - Trend Micro Inc.
HDFS –                     (Replication)

•             3
•                                 (block size)
    (replication factor)
•                                     (rack- aware)
        .




                                                      Copyright 2009 - Trend Micro Inc.
Block Placement

• Policy (v0.19)
    –
    –
    –
    –
•




                   Copyright 2009 - Trend Micro Inc.
Heartbeats

• DataNode   Heartbeats    NameNode
   –   3
• NameNode    Heartbeats      DataNode




                                         Copyright 2009 - Trend Micro Inc.
(Data Correctness)

•       Checksum
    – Cyclic Redundancy Check (CRC32 )
•
    –          512                Checksum
    – DataNode    Checksum
•
    –                     Checksum
    –




                                             Copyright 2009 - Trend Micro Inc.
(User Interface)

•   API
     – Java API
     – C language wrapper for the Java API is also avaiable

•   POSIX like command
     – hadoop dfs -mkdir /foodir
     – hadoop dfs -cat /foodir/myfile.txt
     – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt

•   DFSAdmin
     – bin/hadoop dfsadmin –safemode
     – bin/hadoop dfsadmin –report
     – bin/hadoop dfsadmin -refreshNodes

•   Web
     – http://host:port/dfshealth.jsp


                                                                   Copyright 2009 - Trend Micro Inc.
Web




      Copyright 2009 - Trend Micro Inc.
Web
  (http://172.16.203.136:50070)




Classification                    Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
Java API




           Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
• Hadoop document and installation
   – http://hadoop.apache.org/
• Hadoop Wiki
   – http://wiki.apache.org/hadoop/
• Google File System Paper
   – http://labs.google.com/papers/gfs.html




                                              Copyright 2009 - Trend Micro Inc.

Mais conteúdo relacionado

Semelhante a Introduction to Hadoop and HDFS

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computingkevin liao
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Basekevin liao
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Waredeimos
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadooptokopedia
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Fahmi Fachreza
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyFirman Gautama
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Richard McDougall
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reducekevin liao
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)Joseph Chiang
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 

Semelhante a Introduction to Hadoop and HDFS (20)

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computing
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Base
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Ware
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadoop
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reduce
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 

Último

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Introduction to Hadoop and HDFS

  • 1. Hadoop Hadoop (HDFS)  Public 2009/5/13
  • 2. • Hadoop • Hadoop (HDFS) – – – • Copyright 2009 - Trend Micro Inc.
  • 3. Hadoop ? • Hadoop • Apache top-level Cloud Applications • Hadoop – (HDFS) MapReduce HBase – MapReduce • Java Hadoop Distributed File System (HDFS) • C++/Java/Shell/ Command… A Cluster of Machines • – Linux Mac OS/X Windows Solaris – Copyright 2009 - Trend Micro Inc.
  • 4. Hadoop • 2003 2 – Google MapReduce • 2003 10 – Google Goofle File System (GFS) • 2004 12 – Google MapReduce • 2005 7 – Doug Cutting Nutch MapReduce • 2006 2 – Hadoop Nutch Lucene • 2006 11 – Google Bigtable Copyright 2009 - Trend Micro Inc.
  • 5. Hadoop • 2007 2 – Mike Cafarella Hbase • 2007 4 – Yahoo! 1000 Hadoop • 2008 1 – Hadoop Apache Copyright 2009 - Trend Micro Inc.
  • 6. Who use Hadoop? • Yahoo! – Hadoop 2 CPU 10 • Google – Hadoop • Amazon – Amazon Hadoop – • IBM – Blue Cloud • Trend Micro – Hadoop • Hadoop … – http://wiki.apache.org/hadoop/PoweredBy Copyright 2009 - Trend Micro Inc.
  • 7. Hadoop (HDFS) Copyright 2009 - Trend Micro Inc.
  • 8. HDFS • (Single Namespace) • – 1 1 10 Peta Bytes • – Write-once-read-many – • (block) – 128 MB – (replica) (DataNode) Copyright 2009 - Trend Micro Inc.
  • 9. HDFS • – • (File replication) – 3 . – • – – • – (low latency) – (Batch processing) Copyright 2009 - Trend Micro Inc.
  • 10. Copyright 2009 - Trend Micro Inc.
  • 11. Copyright 2009 - Trend Micro Inc.
  • 12. (NameNode) • NameNode HDFS (File System Namespace) – (blocks) – (block) Data Node • Hadoop cluster • Copyright 2009 - Trend Micro Inc.
  • 13. NameNode (Metadata) • Name node Metadata – Metadata – • Metadata – (files) – (blocks) – (block) (Data Node) – • : (creation time), (replication factor) Copyright 2009 - Trend Micro Inc.
  • 14. NameNode (Metadata) • ( EditLog) – • FsImage – Name Node • (Name Space) • (Block) (File) • – NameNode FsImage EditLog • Checkpoint – NameNode – FsImange EditLog EditLog FsImange Copyright 2009 - Trend Micro Inc.
  • 15. (Secondary NameNode) • NameNode FsImage EditLog NameNode • FSImage EditLog FSImage • FSImage NameNode – NameNode EditLog • Secondary NameNode NameNode (Fail over) – Hadoop Name Node FsImage FsImage (new) EditLog Copyright 2009 - Trend Micro Inc.
  • 16. NameNode • NameNode SPOF (single point of failure) • (High Availablity) SPOF!! Copyright 2009 - Trend Micro Inc.
  • 17. (DataNode) • (Blocks) – ( ext3) – block metadata • (CRC), block – • Block – Blocks NameNode – NameNode block NameNode block Copyright 2009 - Trend Micro Inc.
  • 18. HDFS – (Replication) • 3 • (block size) (replication factor) • (rack- aware) . Copyright 2009 - Trend Micro Inc.
  • 19. Block Placement • Policy (v0.19) – – – – • Copyright 2009 - Trend Micro Inc.
  • 20. Heartbeats • DataNode Heartbeats NameNode – 3 • NameNode Heartbeats DataNode Copyright 2009 - Trend Micro Inc.
  • 21. (Data Correctness) • Checksum – Cyclic Redundancy Check (CRC32 ) • – 512 Checksum – DataNode Checksum • – Checksum – Copyright 2009 - Trend Micro Inc.
  • 22. (User Interface) • API – Java API – C language wrapper for the Java API is also avaiable • POSIX like command – hadoop dfs -mkdir /foodir – hadoop dfs -cat /foodir/myfile.txt – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt • DFSAdmin – bin/hadoop dfsadmin –safemode – bin/hadoop dfsadmin –report – bin/hadoop dfsadmin -refreshNodes • Web – http://host:port/dfshealth.jsp Copyright 2009 - Trend Micro Inc.
  • 23. Web Copyright 2009 - Trend Micro Inc.
  • 24. Web (http://172.16.203.136:50070) Classification Copyright 2009 - Trend Micro Inc.
  • 25. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 26. Java API Copyright 2009 - Trend Micro Inc.
  • 27. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 28. • Hadoop document and installation – http://hadoop.apache.org/ • Hadoop Wiki – http://wiki.apache.org/hadoop/ • Google File System Paper – http://labs.google.com/papers/gfs.html Copyright 2009 - Trend Micro Inc.