SlideShare uma empresa Scribd logo
1 de 9
Dr. C.V. Suresh Babu
DESIGN OF
HADOOP DISTRIBUTED FILE SYSTEM
(CentreforKnowledgeTransfer)
institute
DISCUSSION TOPICS
 Hadoop Distributed File System (HDFS)
 How does HDFS work?
 HDFS Architecture
 Features of HDFS
 Benefits of using HDFS
 Examples: Target Marketing
 HDFS data replication
(CentreforKnowledgeTransfer)
institute
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
 The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop
applications.
 HDFS employs a NameNode and DataNode architecture to implement a distributed file system that
provides high-performance access to data across highly scalable Hadoop clusters.
 Hadoop itself is an open source distributed processing framework that manages data processing and
storage for big data applications.
 HDFS is a key part of the many Hadoop ecosystem technologies.
 It provides a reliable means for managing pools of big data and supporting related big data analytics
applications.
(CentreforKnowledgeTransfer)
institute
HOW DOES HDFS WORK?
 HDFS enables the rapid transfer of data between compute nodes.
 It was closely coupled with MapReduce, a framework for data processing that filters and divides up
work among the nodes in a cluster, and it organizes and condenses the results into a cohesive answer
to a query.
 Similarly, when HDFS takes in data, it breaks the information down into separate blocks and distributes
them to different nodes in a cluster.
(CentreforKnowledgeTransfer)
institute
(CentreforKnowledgeTransfer)
institute
FEATURES OF HDFS
Data replication. This is used to ensure that the data is always available and prevents data loss. For
example, when a node crashes or there is a hardware failure, replicated data can be
pulled from elsewhere within a cluster, so processing continues while data is
recovered.
Fault tolerance and reliability. HDFS' ability to replicate file blocks and store them across nodes in a large
cluster ensures fault tolerance and reliability.
High availability. As mentioned earlier, because of replication across notes, data is available even if the
NameNode or a DataNode fails.
Scalability. Because HDFS stores data on various nodes in the cluster, as requirements increase, a cluster
can scale to hundreds of nodes.
High throughput. Because HDFS stores data in a distributed manner, the data can be processed in
parallel on a cluster of nodes. This, plus data locality (see next bullet), cut the
processing time and enable high throughput.
Data locality. With HDFS, computation happens on the DataNodes where the data resides, rather than
having the data move to where the computational unit is. By minimizing the distance
between the data and the computing process, this approach decreases network
congestion and boosts a system's overall throughput.
(CentreforKnowledgeTransfer)
institute
BENEFITS OF USING HDFS
 Cost effectiveness. The DataNodes that store the data rely on inexpensive off-the-shelf hardware,
which cuts storage costs. Also, because HDFS is open source, there's no licensing fee.
 Large data set storage. HDFS stores a variety of data of any size -- from megabytes to petabytes --
and in any format, including structured and unstructured data.
 Fast recovery from hardware failure. HDFS is designed to detect faults and automatically recover on its
own.
 Portability. HDFS is portable across all hardware platforms, and it is compatible with several operating
systems, including Windows, Linux and Mac OS/X.
 Streaming data access. HDFS is built for high data throughput, which is best for access to streaming
data.
(CentreforKnowledgeTransfer)
institute
EXAMPLES: TARGET MARKETING
 Targeted marketing campaigns depend on marketers knowing a lot about their
target audiences.
 Marketers can get this information from several sources, including CRM systems,
direct mail responses, point-of-sale systems, Facebook and Twitter.
 Because much of this data is unstructured, an HDFS cluster is the most cost-
effective place to put data before analyzing it.
(CentreforKnowledgeTransfer)
institute
HDFS DATA REPLICATION
 Data replication is an important part of the HDFS format as it ensures data
remains available if there's a node or hardware failure.
 As previously mentioned, the data is divided into blocks and replicated
across numerous nodes in the cluster.
 Therefore, when one node goes down, the user can access the data that
was on that node from other machines.
 HDFS maintains the replication process at regular intervals.
(CentreforKnowledgeTransfer)
institute

Mais conteúdo relacionado

Mais procurados

Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 

Mais procurados (20)

Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Hybrid Cloud and Its Implementation
Hybrid Cloud and Its ImplementationHybrid Cloud and Its Implementation
Hybrid Cloud and Its Implementation
 

Semelhante a Design of Hadoop Distributed File System

Hadoop Data Management (1).pdfhbjhkjkkmkm
Hadoop Data Management (1).pdfhbjhkjkkmkmHadoop Data Management (1).pdfhbjhkjkkmkm
Hadoop Data Management (1).pdfhbjhkjkkmkm
vineela19
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
preetik9044
 

Semelhante a Design of Hadoop Distributed File System (20)

Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Data Management (1).pdfhbjhkjkkmkm
Hadoop Data Management (1).pdfhbjhkjkkmkmHadoop Data Management (1).pdfhbjhkjkkmkm
Hadoop Data Management (1).pdfhbjhkjkkmkm
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
HDFS
HDFSHDFS
HDFS
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Hadoop
HadoopHadoop
Hadoop
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hdfs design
Hdfs designHdfs design
Hdfs design
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data Storage
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop paper
Hadoop paperHadoop paper
Hadoop paper
 
paper
paperpaper
paper
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
 

Mais de Dr. C.V. Suresh Babu

Mais de Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Último (20)

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Design of Hadoop Distributed File System

  • 1. Dr. C.V. Suresh Babu DESIGN OF HADOOP DISTRIBUTED FILE SYSTEM (CentreforKnowledgeTransfer) institute
  • 2. DISCUSSION TOPICS  Hadoop Distributed File System (HDFS)  How does HDFS work?  HDFS Architecture  Features of HDFS  Benefits of using HDFS  Examples: Target Marketing  HDFS data replication (CentreforKnowledgeTransfer) institute
  • 3. HADOOP DISTRIBUTED FILE SYSTEM (HDFS)  The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications.  HDFS employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.  Hadoop itself is an open source distributed processing framework that manages data processing and storage for big data applications.  HDFS is a key part of the many Hadoop ecosystem technologies.  It provides a reliable means for managing pools of big data and supporting related big data analytics applications. (CentreforKnowledgeTransfer) institute
  • 4. HOW DOES HDFS WORK?  HDFS enables the rapid transfer of data between compute nodes.  It was closely coupled with MapReduce, a framework for data processing that filters and divides up work among the nodes in a cluster, and it organizes and condenses the results into a cohesive answer to a query.  Similarly, when HDFS takes in data, it breaks the information down into separate blocks and distributes them to different nodes in a cluster. (CentreforKnowledgeTransfer) institute
  • 6. FEATURES OF HDFS Data replication. This is used to ensure that the data is always available and prevents data loss. For example, when a node crashes or there is a hardware failure, replicated data can be pulled from elsewhere within a cluster, so processing continues while data is recovered. Fault tolerance and reliability. HDFS' ability to replicate file blocks and store them across nodes in a large cluster ensures fault tolerance and reliability. High availability. As mentioned earlier, because of replication across notes, data is available even if the NameNode or a DataNode fails. Scalability. Because HDFS stores data on various nodes in the cluster, as requirements increase, a cluster can scale to hundreds of nodes. High throughput. Because HDFS stores data in a distributed manner, the data can be processed in parallel on a cluster of nodes. This, plus data locality (see next bullet), cut the processing time and enable high throughput. Data locality. With HDFS, computation happens on the DataNodes where the data resides, rather than having the data move to where the computational unit is. By minimizing the distance between the data and the computing process, this approach decreases network congestion and boosts a system's overall throughput. (CentreforKnowledgeTransfer) institute
  • 7. BENEFITS OF USING HDFS  Cost effectiveness. The DataNodes that store the data rely on inexpensive off-the-shelf hardware, which cuts storage costs. Also, because HDFS is open source, there's no licensing fee.  Large data set storage. HDFS stores a variety of data of any size -- from megabytes to petabytes -- and in any format, including structured and unstructured data.  Fast recovery from hardware failure. HDFS is designed to detect faults and automatically recover on its own.  Portability. HDFS is portable across all hardware platforms, and it is compatible with several operating systems, including Windows, Linux and Mac OS/X.  Streaming data access. HDFS is built for high data throughput, which is best for access to streaming data. (CentreforKnowledgeTransfer) institute
  • 8. EXAMPLES: TARGET MARKETING  Targeted marketing campaigns depend on marketers knowing a lot about their target audiences.  Marketers can get this information from several sources, including CRM systems, direct mail responses, point-of-sale systems, Facebook and Twitter.  Because much of this data is unstructured, an HDFS cluster is the most cost- effective place to put data before analyzing it. (CentreforKnowledgeTransfer) institute
  • 9. HDFS DATA REPLICATION  Data replication is an important part of the HDFS format as it ensures data remains available if there's a node or hardware failure.  As previously mentioned, the data is divided into blocks and replicated across numerous nodes in the cluster.  Therefore, when one node goes down, the user can access the data that was on that node from other machines.  HDFS maintains the replication process at regular intervals. (CentreforKnowledgeTransfer) institute