SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Hadoop
    Hadoop         (HDFS)



     




Public 2009/5/13
• Hadoop
• Hadoop   (HDFS)
    –
    –
    –
•




                    Copyright 2009 - Trend Micro Inc.
Hadoop               ?

• Hadoop

• Apache top-level                                  Cloud Applications

• Hadoop
    –                (HDFS)                   MapReduce                  HBase

    – MapReduce
•       Java                                  Hadoop Distributed File System
                                                        (HDFS)
•                 C++/Java/Shell/
    Command…
                                                  A Cluster of Machines
•
    – Linux    Mac OS/X Windows     Solaris
    –


                                                          Copyright 2009 - Trend Micro Inc.
Hadoop

• 2003   2
  – Google           MapReduce
• 2003   10
  – Google     Goofle File System (GFS)
• 2004   12
  – Google     MapReduce
• 2005   7
  – Doug Cutting     Nutch                MapReduce
• 2006   2
  – Hadoop          Nutch            Lucene
• 2006   11
  – Google     Bigtable



                                                      Copyright 2009 - Trend Micro Inc.
Hadoop

• 2007   2
  – Mike Cafarella        Hbase
• 2007   4
  – Yahoo!    1000                Hadoop
• 2008   1
  – Hadoop       Apache




                                           Copyright 2009 - Trend Micro Inc.
Who use Hadoop?
•   Yahoo!
    – Hadoop          2              CPU        10
•   Google
    –                 Hadoop
•   Amazon
    – Amazon          Hadoop
    –
•   IBM
    – Blue Cloud
•   Trend Micro
    –        Hadoop

•             Hadoop           …
    – http://wiki.apache.org/hadoop/PoweredBy



                                                     Copyright 2009 - Trend Micro Inc.
Hadoop   (HDFS)




              Copyright 2009 - Trend Micro Inc.
HDFS

•                                                 (Single
    Namespace)
•
    – 1          1             10 Peta Bytes
•
    – Write-once-read-many
    –
•                              (block)
    –                        128 MB
    –                                 (replica)
            (DataNode)




                                                        Copyright 2009 - Trend Micro Inc.
HDFS

•
    –

•       (File replication)
    –                3   .
    –
•
    –
    –
•
    –                         (low latency)

    –    (Batch processing)

                                              Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
(NameNode)

• NameNode           HDFS                (File System
  Namespace)
   –                  (blocks)
   –         (block)             Data Node
• Hadoop cluster
•




                                                  Copyright 2009 - Trend Micro Inc.
NameNode                              (Metadata)

•   Name node         Metadata

     –         Metadata

     –

•   Metadata
     –              (files)
     –                   (blocks)

     –       (block)
             (Data Node)
     –
         •      :            (creation time),
                       (replication factor)



                                                       Copyright 2009 - Trend Micro Inc.
NameNode                             (Metadata)
•             (      EditLog)
    –

•   FsImage
    – Name Node

         •                (Name Space)
         •        (Block)     (File)

         •
    – NameNode
      FsImage  EditLog


•   Checkpoint
    –     NameNode
    –           FsImange
        EditLog    EditLog
                          FsImange



                                                      Copyright 2009 - Trend Micro Inc.
(Secondary NameNode)

•    NameNode        FsImage     EditLog        NameNode

•    FSImage   EditLog                           FSImage
•        FSImage       NameNode
    – NameNode        EditLog
• Secondary NameNode            NameNode           (Fail over)
    – Hadoop              Name Node


          FsImage
                                      FsImage
                                       (new)

          EditLog



                                                           Copyright 2009 - Trend Micro Inc.
NameNode

•   NameNode          SPOF (single point of failure)
•              (High Availablity)


               SPOF!!




                                                Copyright 2009 - Trend Micro Inc.
(DataNode)

•                    (Blocks)

    –                     (     ext3)

    –        block   metadata
        •               (CRC), block

    –
•   Block
    –            Blocks
      NameNode
    –   NameNode
      block
            NameNode
      block



                                        Copyright 2009 - Trend Micro Inc.
HDFS –                     (Replication)

•             3
•                                 (block size)
    (replication factor)
•                                     (rack- aware)
        .




                                                      Copyright 2009 - Trend Micro Inc.
Block Placement

• Policy (v0.19)
    –
    –
    –
    –
•




                   Copyright 2009 - Trend Micro Inc.
Heartbeats

• DataNode   Heartbeats    NameNode
   –   3
• NameNode    Heartbeats      DataNode




                                         Copyright 2009 - Trend Micro Inc.
(Data Correctness)

•       Checksum
    – Cyclic Redundancy Check (CRC32 )
•
    –          512                Checksum
    – DataNode    Checksum
•
    –                     Checksum
    –




                                             Copyright 2009 - Trend Micro Inc.
(User Interface)

•   API
     – Java API
     – C language wrapper for the Java API is also avaiable

•   POSIX like command
     – hadoop dfs -mkdir /foodir
     – hadoop dfs -cat /foodir/myfile.txt
     – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt

•   DFSAdmin
     – bin/hadoop dfsadmin –safemode
     – bin/hadoop dfsadmin –report
     – bin/hadoop dfsadmin -refreshNodes

•   Web
     – http://host:port/dfshealth.jsp


                                                                   Copyright 2009 - Trend Micro Inc.
Web




      Copyright 2009 - Trend Micro Inc.
Web
  (http://172.16.203.136:50070)




Classification                    Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
Java API




           Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
• Hadoop document and installation
   – http://hadoop.apache.org/
• Hadoop Wiki
   – http://wiki.apache.org/hadoop/
• Google File System Paper
   – http://labs.google.com/papers/gfs.html




                                              Copyright 2009 - Trend Micro Inc.

Mais conteúdo relacionado

Semelhante a Zh Tw Introduction To Hadoop And Hdfs

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computingkevin liao
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Basekevin liao
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Waredeimos
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadooptokopedia
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Fahmi Fachreza
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyFirman Gautama
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Richard McDougall
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reducekevin liao
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)Joseph Chiang
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 

Semelhante a Zh Tw Introduction To Hadoop And Hdfs (20)

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computing
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Base
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Ware
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadoop
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reduce
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Zh Tw Introduction To Hadoop And Hdfs

  • 1. Hadoop Hadoop (HDFS)  Public 2009/5/13
  • 2. • Hadoop • Hadoop (HDFS) – – – • Copyright 2009 - Trend Micro Inc.
  • 3. Hadoop ? • Hadoop • Apache top-level Cloud Applications • Hadoop – (HDFS) MapReduce HBase – MapReduce • Java Hadoop Distributed File System (HDFS) • C++/Java/Shell/ Command… A Cluster of Machines • – Linux Mac OS/X Windows Solaris – Copyright 2009 - Trend Micro Inc.
  • 4. Hadoop • 2003 2 – Google MapReduce • 2003 10 – Google Goofle File System (GFS) • 2004 12 – Google MapReduce • 2005 7 – Doug Cutting Nutch MapReduce • 2006 2 – Hadoop Nutch Lucene • 2006 11 – Google Bigtable Copyright 2009 - Trend Micro Inc.
  • 5. Hadoop • 2007 2 – Mike Cafarella Hbase • 2007 4 – Yahoo! 1000 Hadoop • 2008 1 – Hadoop Apache Copyright 2009 - Trend Micro Inc.
  • 6. Who use Hadoop? • Yahoo! – Hadoop 2 CPU 10 • Google – Hadoop • Amazon – Amazon Hadoop – • IBM – Blue Cloud • Trend Micro – Hadoop • Hadoop … – http://wiki.apache.org/hadoop/PoweredBy Copyright 2009 - Trend Micro Inc.
  • 7. Hadoop (HDFS) Copyright 2009 - Trend Micro Inc.
  • 8. HDFS • (Single Namespace) • – 1 1 10 Peta Bytes • – Write-once-read-many – • (block) – 128 MB – (replica) (DataNode) Copyright 2009 - Trend Micro Inc.
  • 9. HDFS • – • (File replication) – 3 . – • – – • – (low latency) – (Batch processing) Copyright 2009 - Trend Micro Inc.
  • 10. Copyright 2009 - Trend Micro Inc.
  • 11. Copyright 2009 - Trend Micro Inc.
  • 12. (NameNode) • NameNode HDFS (File System Namespace) – (blocks) – (block) Data Node • Hadoop cluster • Copyright 2009 - Trend Micro Inc.
  • 13. NameNode (Metadata) • Name node Metadata – Metadata – • Metadata – (files) – (blocks) – (block) (Data Node) – • : (creation time), (replication factor) Copyright 2009 - Trend Micro Inc.
  • 14. NameNode (Metadata) • ( EditLog) – • FsImage – Name Node • (Name Space) • (Block) (File) • – NameNode FsImage EditLog • Checkpoint – NameNode – FsImange EditLog EditLog FsImange Copyright 2009 - Trend Micro Inc.
  • 15. (Secondary NameNode) • NameNode FsImage EditLog NameNode • FSImage EditLog FSImage • FSImage NameNode – NameNode EditLog • Secondary NameNode NameNode (Fail over) – Hadoop Name Node FsImage FsImage (new) EditLog Copyright 2009 - Trend Micro Inc.
  • 16. NameNode • NameNode SPOF (single point of failure) • (High Availablity) SPOF!! Copyright 2009 - Trend Micro Inc.
  • 17. (DataNode) • (Blocks) – ( ext3) – block metadata • (CRC), block – • Block – Blocks NameNode – NameNode block NameNode block Copyright 2009 - Trend Micro Inc.
  • 18. HDFS – (Replication) • 3 • (block size) (replication factor) • (rack- aware) . Copyright 2009 - Trend Micro Inc.
  • 19. Block Placement • Policy (v0.19) – – – – • Copyright 2009 - Trend Micro Inc.
  • 20. Heartbeats • DataNode Heartbeats NameNode – 3 • NameNode Heartbeats DataNode Copyright 2009 - Trend Micro Inc.
  • 21. (Data Correctness) • Checksum – Cyclic Redundancy Check (CRC32 ) • – 512 Checksum – DataNode Checksum • – Checksum – Copyright 2009 - Trend Micro Inc.
  • 22. (User Interface) • API – Java API – C language wrapper for the Java API is also avaiable • POSIX like command – hadoop dfs -mkdir /foodir – hadoop dfs -cat /foodir/myfile.txt – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt • DFSAdmin – bin/hadoop dfsadmin –safemode – bin/hadoop dfsadmin –report – bin/hadoop dfsadmin -refreshNodes • Web – http://host:port/dfshealth.jsp Copyright 2009 - Trend Micro Inc.
  • 23. Web Copyright 2009 - Trend Micro Inc.
  • 24. Web (http://172.16.203.136:50070) Classification Copyright 2009 - Trend Micro Inc.
  • 25. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 26. Java API Copyright 2009 - Trend Micro Inc.
  • 27. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 28. • Hadoop document and installation – http://hadoop.apache.org/ • Hadoop Wiki – http://wiki.apache.org/hadoop/ • Google File System Paper – http://labs.google.com/papers/gfs.html Copyright 2009 - Trend Micro Inc.