SlideShare uma empresa Scribd logo
1 de 34
Hadoop
Distributed File System
(HDFS)
SEMINAR GUIDE
Mr. PRAMOD PAVITHRAN
HEAD OF DIVISION
COMPUTER SCIENCE & ENGINEERING
SCHOOL OF ENGINEERING, CUSAT
PRESENTED BY
VIJAY PRATAP SINGH
REG NO: 12110083
S7, CS-B
ROLL NO: 81
CONTENTS
WHAT IS HADOOP
PROJECT COMPONENTS IN HADOOP
MAP/REDUCE
HDFS
ARCHITECTURE
GOALS OF HADOOP
COMPARISION WITH OTHER SYSTEMS
CONCLUSION
REFERENCES
WHAT IS HADOOP…???
WHAT IS HADOOP…???
WHAT IS HADOOP…???
WHAT IS HADOOP…???
o Hadoop is an open-source software framework .
o Hadoop framework consists on two main layers
o Distributed file system (HDFS)
o Execution engine (MapReduce)
o Supports data-intensive distributed applications.
o Licensed under the Apache v2 license.
o It enables applications to work with thousands of computation-independent
computers and petabytes of data
WHY HADOOP…???
PROJECT COMPONENTS IN
HADOOP
MAP/REDUCE
o Hadoop is the popular open source implementation of map/reduce
o MapReduce is a programming model for processing large data sets
o MapReduce is typically used to do distributed computing on clusters of computers
o MapReduce can take advantage of locality of data, processing data on or near the storage
assets to decrease transmission of data.
oThe model is inspired by the map and reduce functions
o"Map" step: The master node takes the input, divides it into smaller sub-problems, and
distributes them to slave nodes. The slave node processes the smaller problem, and passes
the answer back to its master node.
o"Reduce" step: The master node then collects the answers to all the sub-problems and
combines them in some way to form the final output
HDFS
Highly scalable file system
◦ 6k nodes and 120pb
◦ Add commodity servers and disks to scale storage and IO bandwidth
Supports parallel reading & processing of data
◦ Optimized for streaming reads/writes of large files
◦ Bandwidth scales linearly with the number of nodes and disks
Fault tolerant & easy management
◦ Built in redundancy
◦ Tolerate disk and node failure
◦ Automatically manages addition/removal of nodes
◦ One operator per 3k nodes
Scalable, Reliable & Manageable
ISSUES IN CURRENT SYSTEM
BIG DATA
INCREASING BIG DATA
HADOOP’S APPROACH
Big Data
Computation
Computation
Computation
Computation
Combined Result
ARCHITECTURE OF HADOOP
HADOOP MASTER/SLAVE
ARCHITECTURE
MAP REDUCE ENGINE
MAP REDUCE ENGINE
ARCHITECTURE OF HDFS
ARCHITECTURE OF HDFS
CLIENT INTERACTION TO
HADOOP
• A
Rack 1
DataNode 1
DataNode 9
DataNode 7
Client
F
CBA
Rack 5
NameNode
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Core Switch
Switch Switch
I want to
write file.txt
block A
Ok, Write to
Data Nodes
[1,7,9]
Ready
DN
7+9 Ready
9
Ready!A A
A
HDFS
WRITE
• A
Rack 1
DataNode 1
DataNode 9
DataNode 7
Client
F
CBA
Rack 5
NameNode
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Core Switch
Switch Switch
A A
A
Block Received
Success
Metadata
File.txt =
Blk
DN : 1,7,9
A
HDFS WRITE
(PIPELINED)
• A
Rack 1
DataNode 1
DataNode 9
DataNode 7
Client
F
CBA
Rack 5
NameNode
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Core Switch
Switch Switch
I want to read
file.txt block
A
Available at
nodes
[1,7,9]
A A
A
HDFS READ
GOALS OF HDFS
Very Large Distributed File System
◦ 10K nodes, 100 million files, 10PB
Assumes Commodity Hardware
◦ Files are replicated to handle hardware failure
◦ Detect failures and recover from them
Optimized for Batch Processing
◦ Data locations exposed so that computations can move to where data resides
◦ Provides very high aggregate bandwidth
SCALABILITY OF HADOOP
EASE TO PROGRAMMERS
HADOOP VS. OTHER SYSTEMS
HADOOP USERS
TO LEARN MORE
Source code
◦ http://hadoop.apache.org/version_control.html
◦ http://svn.apache.org/viewvc/hadoop/common/trunk/
Hadoop releases
◦ http://hadoop.apache.org/releases.html
Contribute to it
◦ http://wiki.apache.org/hadoop/HowToContribute
CONCLUSION
Hdfs provides a reliable, scalable and manageable solution for
working with huge amounts of data
Future secure
Hdfs has been deployed in clusters of 10 to 4k datanodes
◦ Used in production at companies such as yahoo! , FB , Twitter , ebay
◦ Many enterprises including financial companies use hadoop
REFERENCES
[1] M. Zukowski, S. Heman, N. Nes, And P. Boncz. Cooperative Scans: Dynamic Bandwidth
Sharing In A DBMS. In VLDB ’07: Proceedings Of The 33rd International Conference On
Very Large Data Bases, Pages 23–34, 2007.
[2] Tom White, Hadoop The Definite Guide, O’reilly Media ,Third Edition, May 2012
[3] Jeffrey Shafer, Scott Rixner, And Alan L. Cox, The Hadoop Distributed Filesystem: Balancing
Portability And Performance, Rice University, Houston, TX
[4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed
File System, Yahoo, Sunnyvale, California, USA
[5] Jens Dittrich, Jorge-arnulfo Quian, E-ruiz, Information Systems Group, Efficient Big Data
Processing In Hadoop Mapreduce , Saarland University
Thankyou… 
Queries

Mais conteúdo relacionado

Destaque

Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Destaque (6)

Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Último

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

HADOOP AND HDFS presented by Vijay Pratap Singh

  • 1. Hadoop Distributed File System (HDFS) SEMINAR GUIDE Mr. PRAMOD PAVITHRAN HEAD OF DIVISION COMPUTER SCIENCE & ENGINEERING SCHOOL OF ENGINEERING, CUSAT PRESENTED BY VIJAY PRATAP SINGH REG NO: 12110083 S7, CS-B ROLL NO: 81
  • 2. CONTENTS WHAT IS HADOOP PROJECT COMPONENTS IN HADOOP MAP/REDUCE HDFS ARCHITECTURE GOALS OF HADOOP COMPARISION WITH OTHER SYSTEMS CONCLUSION REFERENCES
  • 6. WHAT IS HADOOP…??? o Hadoop is an open-source software framework . o Hadoop framework consists on two main layers o Distributed file system (HDFS) o Execution engine (MapReduce) o Supports data-intensive distributed applications. o Licensed under the Apache v2 license. o It enables applications to work with thousands of computation-independent computers and petabytes of data
  • 9. MAP/REDUCE o Hadoop is the popular open source implementation of map/reduce o MapReduce is a programming model for processing large data sets o MapReduce is typically used to do distributed computing on clusters of computers o MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. oThe model is inspired by the map and reduce functions o"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to slave nodes. The slave node processes the smaller problem, and passes the answer back to its master node. o"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the final output
  • 10. HDFS Highly scalable file system ◦ 6k nodes and 120pb ◦ Add commodity servers and disks to scale storage and IO bandwidth Supports parallel reading & processing of data ◦ Optimized for streaming reads/writes of large files ◦ Bandwidth scales linearly with the number of nodes and disks Fault tolerant & easy management ◦ Built in redundancy ◦ Tolerate disk and node failure ◦ Automatically manages addition/removal of nodes ◦ One operator per 3k nodes Scalable, Reliable & Manageable
  • 22. • A Rack 1 DataNode 1 DataNode 9 DataNode 7 Client F CBA Rack 5 NameNode Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Core Switch Switch Switch I want to write file.txt block A Ok, Write to Data Nodes [1,7,9] Ready DN 7+9 Ready 9 Ready!A A A HDFS WRITE
  • 23. • A Rack 1 DataNode 1 DataNode 9 DataNode 7 Client F CBA Rack 5 NameNode Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Core Switch Switch Switch A A A Block Received Success Metadata File.txt = Blk DN : 1,7,9 A HDFS WRITE (PIPELINED)
  • 24. • A Rack 1 DataNode 1 DataNode 9 DataNode 7 Client F CBA Rack 5 NameNode Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Core Switch Switch Switch I want to read file.txt block A Available at nodes [1,7,9] A A A HDFS READ
  • 25. GOALS OF HDFS Very Large Distributed File System ◦ 10K nodes, 100 million files, 10PB Assumes Commodity Hardware ◦ Files are replicated to handle hardware failure ◦ Detect failures and recover from them Optimized for Batch Processing ◦ Data locations exposed so that computations can move to where data resides ◦ Provides very high aggregate bandwidth
  • 28. HADOOP VS. OTHER SYSTEMS
  • 30. TO LEARN MORE Source code ◦ http://hadoop.apache.org/version_control.html ◦ http://svn.apache.org/viewvc/hadoop/common/trunk/ Hadoop releases ◦ http://hadoop.apache.org/releases.html Contribute to it ◦ http://wiki.apache.org/hadoop/HowToContribute
  • 31. CONCLUSION Hdfs provides a reliable, scalable and manageable solution for working with huge amounts of data Future secure Hdfs has been deployed in clusters of 10 to 4k datanodes ◦ Used in production at companies such as yahoo! , FB , Twitter , ebay ◦ Many enterprises including financial companies use hadoop
  • 32. REFERENCES [1] M. Zukowski, S. Heman, N. Nes, And P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing In A DBMS. In VLDB ’07: Proceedings Of The 33rd International Conference On Very Large Data Bases, Pages 23–34, 2007. [2] Tom White, Hadoop The Definite Guide, O’reilly Media ,Third Edition, May 2012 [3] Jeffrey Shafer, Scott Rixner, And Alan L. Cox, The Hadoop Distributed Filesystem: Balancing Portability And Performance, Rice University, Houston, TX [4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, Yahoo, Sunnyvale, California, USA [5] Jens Dittrich, Jorge-arnulfo Quian, E-ruiz, Information Systems Group, Efficient Big Data Processing In Hadoop Mapreduce , Saarland University