Hadoop Distributed File System(HDFS) : Behind the scenes

•Transferir como PPTX, PDF•

9 gostaram•3,805 visualizações

The presentation sail you through the deep concepts of HDFS architecture, Where HDFS fits in Hadoop, What is HDFS Architecture and What is its role...

Tecnologia

LET’S FIRST
UNDERSTAND
BUZZWORDS IN THE HADOOP
WORLD

HDFS ARCHITECTURE
 Name Node
 Data Node
 Task Tracker
 Job tracker
 Image and Journal
 HDFS Client
 Checkpoint Node
 Backup Node

Backup Node

Image Journal

Name Node
Job Tracker Checkpoint
HDFS
Client

Task Tracker Task Tracker Task Tracker

Data Node 1 DataNode 2 ……….. DataNode N

NAME NODE

Job
Tracker Journal

Inode Image

Checkpoint

 Inode - Files and directories are represented on the
NameNode, which record attributes like permissions,
modification and access times, namespace and disk
space quotas.

 Image - The inode data and the list of blocks belonging
to each file

 Checkpoint - The persistent record of the image stored
in the local host’s native file system

 Journal - Write-ahead commit log for changes to the file
system that must be persistent.

DATA NODE
On Start Up…

Name
Data Node Node

DATA NODE
Total
Fraction #Data Transfers
Storage
Storage In Progress
Capacity

Data Node Name Node

Commands

FILE I/O
OPERATIONS
Single Writer

Multiple Reader

DATA WRITE OPERATION
client DN1 DN2 DN3
setup
Client Name Node
packet1

DN1 packet2

packet3

DN2 packet4

packet5
DN3
close

DN4

DATA WRITE/READ OPERATION

 Single Writer Multiple
Reader Model
 Lease Management (Soft

Client
Limit and Hard Limit)
Name Node
 Pipelining, Buffering and
Hflush
DN1
 Checksum for data
integrity
 Choosing nodes for read
operation

BLOCK PLACEMENT
Name Node
Add(data)
Client Inode Image
Data Nodes for Replica

checkpoint
Journal

/

RACK
RACK1
3

DN1 DN2 DN3 DN4 DN5 D11 D12 D13 D14 D15

RACK2

DN6 DN7 DN8 DN9 D10

REPLICATION MANAGEMENT
Name Node

Inode Image

/
Journal checkpoint

RACK1 RACK3

DN1 DN2 DN3 DN4 DN5 D11 D12 D13 D14 D15

RACK2 Over Replicated

Under Replicated
DN6 DN7 DN8 DN9 D10

BALANCER
 Balancing the disk space utilization on individual
data nodes.
 Based on utilization threshold.

 Utilization balancing follows block placement policy.

SCANNER
 Scanner verifies the data integrity based on checksum.

Hadoop Distributed File System(HDFS) : Behind the scenes

Mais conteúdo relacionado

Mais procurados

Hadoop Distributed File Systemelliando dias

Hadoop HDFS Conceptstutorialvillage

Introduction to hadoop and hdfsshrey mehrotra

Hdfs architectureAisha Siddiqa

Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori

HDFS User ReferenceBiju Nair

Hadoop HDFS ConceptsProTechSkills Training

Hadoop Distributed File SystemAnand Kulkarni

Hadoop Distributed File SystemRutvik Bapat

HDFS Trunncate: Evolving Beyond Write-Once SemanticsDataWorks Summit

Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko

Hadoop Introductiontutorialvillage

Hadoop Distributed File SystemVaibhav Jain

Hadoop and HDFSSatyaHadoop

Hadoop hdfsSudipta Ghosh

HadoopEsraa El Ghoul

Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma

Hadoop ArchitectureDelhi/NCR HUG

March 2011 HUG: HDFS FederationYahoo Developer Network

Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit

Mais procurados (20)

Hadoop Distributed File System

Hadoop HDFS Concepts

Introduction to hadoop and hdfs

Hdfs architecture

Ravi Namboori Hadoop & HDFS Architecture

HDFS User Reference

Hadoop HDFS Concepts

Hadoop Distributed File System

HDFS Trunncate: Evolving Beyond Write-Once Semantics

Coordinating Metadata Replication: Survival Strategy for Distributed Systems

Hadoop Introduction

Hadoop Distributed File System

Hadoop and HDFS

Hadoop hdfs

Hadoop

Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System

Hadoop Architecture

March 2011 HUG: HDFS Federation

Dynamic Namespace Partitioning with Giraffa File System

Semelhante a Hadoop Distributed File System(HDFS) : Behind the scenes

Understanding hdfsThirunavukkarasu Ps

Hadoop InsideEun-Jo Lee

HA Hadoop -ApacheCon talkSteve Loughran

An Hour of DB2 TipsCraig Mullins

Hadoop hbase mapreduceFARUK BERKSÖZ

HDFS - What's New and FutureDataWorks Summit

Introduction to Apache AccumuloJared Winick

Webinar: General Technical Overview of MongoDBMongoDB

Hadoop 1.x vs 2Rommel Garcia

Couchbase Server 2.0 - XDCR - Deep diveDipti Borkar

Conference slides: MySQL Cluster Performance TuningSeveralnines

Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran

RuG Guest Lecturefvanvollenhoven

Oracle GoldenGate DB2 to Oracle11gR2 Configurationgrigorianvlad

ABAP Open SQL & Internal Tablesapdocs. info

03 abap3-090715081232-phpapp01 (1)saifee37

Introduction of HadoopShao-Yen Hung

This is redis - feature and usecaseKris Jeong

Facebook's HBase Backups - StampedeCon 2012StampedeCon

Semelhante a Hadoop Distributed File System(HDFS) : Behind the scenes (20)

Understanding hdfs

Hadoop Inside

HA Hadoop -ApacheCon talk

An Hour of DB2 Tips

Hadoop hbase mapreduce

HDFS - What's New and Future

Introduction to Apache Accumulo

Webinar: General Technical Overview of MongoDB

Hadoop 1.x vs 2

Couchbase Server 2.0 - XDCR - Deep dive

Conference slides: MySQL Cluster Performance Tuning

Availability and Integrity in hadoop (Strata EU Edition)

RuG Guest Lecture

Oracle GoldenGate DB2 to Oracle11gR2 Configuration

ABAP Open SQL & Internal Table

03 abap3-090715081232-phpapp01 (1)

Introduction of Hadoop

This is redis - feature and usecase

Facebook's HBase Backups - StampedeCon 2012

Mais de Nitin Khattar

360 view of customer digital journeyNitin Khattar

Agile bringing Big Data & Analytics closerNitin Khattar

Phone for me, tablet for we modsNitin Khattar

Flex 4 tipsNitin Khattar

Flex Mock Testing Frameworks: Comparative AnalysisNitin Khattar

Jbpm as a bpmsNitin Khattar

Mais de Nitin Khattar (6)

360 view of customer digital journey

Agile bringing Big Data & Analytics closer

Phone for me, tablet for we mods

Flex 4 tips

Flex Mock Testing Frameworks: Comparative Analysis

Jbpm as a bpms

Último

Partners Life - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

How to convert PDF to text with Nanonetsnaman860154

A Domino Admins Adventures (Engage 2024)Gabriella Davis

GenCyber Cyber Security Day PresentationMichael W. Hawkins

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Hadoop Distributed File System(HDFS) : Behind the scenes

2. WHAT IS

4. WHERE HDFS FITS IN HADOOP?

5. LET’S FIRST UNDERSTAND BUZZWORDS IN THE HADOOP WORLD

10. CLUSTERING

11. IT’S TIME FOR DEEP DIVE…

12. HDFS ARCHITECTURE  Name Node  Data Node  Task Tracker  Job tracker  Image and Journal  HDFS Client  Checkpoint Node  Backup Node

13. Backup Node Image Journal Name Node Job Tracker Checkpoint HDFS Client Task Tracker Task Tracker Task Tracker Data Node 1 DataNode 2 ……….. DataNode N

14. NAME NODE Job Tracker Journal Inode Image Checkpoint

15.  Inode - Files and directories are represented on the NameNode, which record attributes like permissions, modification and access times, namespace and disk space quotas.  Image - The inode data and the list of blocks belonging to each file  Checkpoint - The persistent record of the image stored in the local host’s native file system  Journal - Write-ahead commit log for changes to the file system that must be persistent.

16. DATA NODE On Start Up… Name Data Node Node

17. DATA NODE Total Fraction #Data Transfers Storage Storage In Progress Capacity Data Node Name Node Commands

18. HDFS CLIENT

19. IMAGE & JOURNAL Flush & Sync Operation

20. CHECKPOINT NODE

21. BACKUP NODE

22. FILE I/O OPERATIONS Single Writer Multiple Reader

23. DATA WRITE OPERATION client DN1 DN2 DN3 setup Client Name Node packet1 DN1 packet2 packet3 DN2 packet4 packet5 DN3 close DN4

24. DATA WRITE/READ OPERATION  Single Writer Multiple Reader Model  Lease Management (Soft Client Limit and Hard Limit) Name Node  Pipelining, Buffering and Hflush DN1  Checksum for data integrity  Choosing nodes for read operation

25. BLOCK PLACEMENT Name Node Add(data) Client Inode Image Data Nodes for Replica checkpoint Journal / RACK RACK1 3 DN1 DN2 DN3 DN4 DN5 D11 D12 D13 D14 D15 RACK2 DN6 DN7 DN8 DN9 D10

26. REPLICATION MANAGEMENT Name Node Inode Image / Journal checkpoint RACK1 RACK3 DN1 DN2 DN3 DN4 DN5 D11 D12 D13 D14 D15 RACK2 Over Replicated Under Replicated DN6 DN7 DN8 DN9 D10

27. BALANCER  Balancing the disk space utilization on individual data nodes.  Based on utilization threshold.  Utilization balancing follows block placement policy.

28. SCANNER  Scanner verifies the data integrity based on checksum.

Notas do Editor

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications.Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. An important characteristic of Hadoop is the partitioning of data and computation across many (thousands) of hosts, and executing application computations in parallel close to their data.
Files and directories are represented on the NameNode by INODES, which record attributes like permissions, modification and access times, namespace and disk space quotas.The NameNode maintains the namespace tree and the mapping of file blocks to DataNodes(the physical location of file data).The inode data and the list of blocks belonging to each file comprise the metadata of the namesystem called the Image. The persistent recordof the image stored in the local host’s native files system is called a Checkpoint. The NameNode also stores the modification log of the image called the Journal in the local host’s native file system.
During startup each DataNode connects to the NameNode and performs :-1. HANDSHAKE - The purpose of the handshake is to verify the namespace ID and the software version of theDataNode. Namespace ID is assigned when File System is formatted. It is stored on each node in a cluster and every node on a cluster has same id. A DataNode that is newly initialized and without any namespace ID is permitted to join the cluster and receive the cluster’s namespace ID on startup.2. REGISTERATION- DataNodes persistently store their unique storage IDs. The storage ID is an internal identifier of the DataNode, which makes it recognizable even if it is restarted with a different IP address or port. The storage ID is assigned to the DataNode when it registers with the NameNode for the first time and never changes after that.3. BLOCK REPORT – Block ID + Generation Stamp + Length of each blockThe first block report is sent immediately after the DataNode registration. Subsequent block reports are sent every hour.
HeartBeats= Total Storage Capacity + Fraction of Storage + No. of Data Transfers in progressDuring normal operation DataNodes send heartbeats to the NameNode to confirm that the DataNode is operating and the block replicas it hosts are available. The default heartbeat interval is three seconds. If the NameNode does not receive a heartbeat from a DataNode in ten minutes the NameNode considers the DataNode to be out of service and the block replicas hosted by that DataNode to be unavailable. The NameNode then schedules creation of new replicas of those blocks on other DataNodes.The NameNode does not directly call DataNodes. It uses replies to heartbeats to send instructions to the DataNodes. The instructions include commands to:• replicate blocks to other nodes• remove local block replicas• re-register or to shut down the node• send an immediate block report
User applications access the file system using the HDFS client,a code library that exports the HDFS file system interface.HDFS provides an API that exposes the locations of a file blocks. This allows applications like the MapReduce framework to schedule a task to where the data are located, thus improving the read performance. It also allows an application to set the replication factorof a file.
The journal is a write-ahead commit log for changes to the file system that must be persistent. For each client-initiated transaction, the change is recorded in the journal, and the journal file is flushed and synched before the change is committed to the HDFS client.The checkpoint file is never changed by the NameNode, it is replaced in its entirety when a new checkpoint is created during restart. During startup the NameNode initializes the namespace image from the checkpoint, and then replays changes from the journal until the image is up-to-date with the last state of the file system. A new checkpoint and empty journal are written back to the storage directories before the NameNode starts serving clients.The NameNode is a multithreaded system and processes requests simultaneously from multiple clients. Saving a transaction to disk becomes a bottleneck since all other threads need to wait until the synchronous flush-and-sync procedure initiated by one of them is complete. In order to optimize this process the NameNode batches multiple transactions initiated by different clients. When one of the NameNode’s threads initiates a flush-and-sync operation, all transactions batched at that time are committed together. Remaining threads only need to check that their transactions have been saved and do not need to initiate a flush-and-sync operation.
The Checkpoint Node periodically combines the existing checkpoint and journal to create a new checkpoint and an empty journal.Creating periodic checkpoints is one way to protect the file system metadata.Creating a checkpoint lets the NameNodetruncate the tail of the journal when the new checkpoint is uploaded to the NameNode.
Like a CheckpointNode, the BackupNode is capable of creating periodic checkpoints, but in addition it maintains an in-memory, up-to-date image of the file systemnamespace that is always synchronized with the state of the NameNode.If the NameNode fails, the BackupNode’s image in memory and the checkpoint on disk is a record of the latest namespace state.The BackupNode can be viewed as a read-only NameNode.
Client has to write a block of data. It requests Name Node for the location where to write the block. Name node, based on the placement and replication policy determines the list of nodes which will hold the data and its replicas. The list of nodes is ordered based on certain criteria. A pipeline is set between the data nodes in a manner that the length of pipeline is minimum.Once the acknowledgement of the pipeline setup is received, client pushes the first packet of the data to the first node in the pipeline. Once the data is written to the first node, it gets transmitted along the pipeline to the further nodes. When the data packet is written to all the nodes, an acknowledgement is sent back to the hdfs client. The client will not wait for the acknowledgement, and will write the next packet, until there is room in the outstanding window.Each outgoing package will reduce the size of the outstanding window. Each incoming acknowledgement will increase the size.
Lets dive more into the details of the Read and write operations. How it is handled at the data nodes level.Write operation follows the Single Writer multiple reader mechanism. Means if a client is writing on a node, no other client will be allowed to write. But other clients are allowed to read. When a client needs to write to a datanode, it is granted a Lease on that node by HDFS. There are soft limits and hard limits on that lease. Client keeps on renewing the lease as it writes to the node. If the Soft limit time duration expires and client has not renewed the lease, other client can preempt the lease.If the hard limit expires, HDFS reclaims the lease, and can assign the lease to some other client.Data nodes to host the replicas form a pipeline, the order of which minimized the total distance to the last node in the list of data nodes. Data block is written to the pipeline in form of packets. Buffering of packets occurs first at the client. Once the buffer is full, it is pushed to the next node in the pipeline.HDFS doesn’t guarantee that data will be visible to other clients until the file being written is closed. In order to make the data visible earlier, hflush operation can be invoked by the client. Hflush operation will push the packet to the pipeline, and will not wait for the buffer to be complete.To ensure data integrity, checksums are calculated and are stored with the Data node in a separate file which contains the metadata about the node. When a client creates the HDFS file, it computes the checksum sequence for each block and sends it along with the data to the data nodes.When some client reads the data block from the data node, it recomputes the checksum for each datablock, and compares it with the checksum stored with the data node. If there is mismatch, the data integrity fails and client reads the data from some other data node.When the client opens a file for the read operation, it obtains the list of data nodes which contain the replica of the data blocks of the file ordered by their distance. It first tries to read from the closest possible replica. It it fails because of any reason (data integrity or node down), it tries to read from the next replica.We will see how name node identifies the node closest to the client.
In general clusters don’t have flat topology. They follow the rack approach in which there are multiple racks. All the nodes in the rack are connected through a switch. All the racks are connected through another switch. So there is some sort of tree hierarchy. Given the address of data node, name node can identify, the rack to which the datanode belongs. So for two nodes in different racks, the distance between them is the sum of distance of both the nodes from their common ancestor.Replica placement policy is important from the reliabilty and read/write operations. It is a configurable policy. Default policy - HDFS tries to place the first replica on the rack on which the writer is located. Second and third replica on some rack which is different that the rack on which client is located and others randomly.It assures that no data node can contain more than one replica of a block. And no rack can contain more than 2 replicas of a block. (Provided that there are sufficient number of racks and nodes).
There can be scenario when a block of data can become over replicated and under replicated (because of node failure and recovery after node failure). If a data block becomes over replicated, Name node identifies the data node from which to remove the block. While removing it makes sure that removing the replica doesnot result into reduction in number of racks in which the node is replicated. When a block becomes under replicated, name node places the block on a priority queue of data blocks waiting for the replication. This queue is prioritized based on the replication factor. The data blocks having one or less replica have highest priority. Data blocks which have more than 2/3 of the replication factor have lower priority. A background thread repeatedly checks the head of the queue for the replication. Replica placement follows the same policy as that of the block placement.
Until now we have seen that there block placement policy doesn’t care about the disk usage on a particular data node. Because of which some data nodes can become heavily utilized and others can remain under utilized. In order to balance this, there is a balancer thread running in the background, which computes the utilization ratio of each node and utilization ratio of the entire cluster. If the utilization ratio of a data node exceeds the utilization ratio of the entire cluster by certain threshold value, it will move the data blocks on that node to other nodes in the cluster. While moving it will again consult the name node for the new location of the block in the system. In order to maximize the throughput it involves certain policies in case of inter rack transportation.

Hadoop Distributed File System(HDFS) : Behind the scenes

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Hadoop Distributed File System(HDFS) : Behind the scenes

Semelhante a Hadoop Distributed File System(HDFS) : Behind the scenes (20)

Mais de Nitin Khattar

Mais de Nitin Khattar (6)

Último

Último (20)

Hadoop Distributed File System(HDFS) : Behind the scenes

Notas do Editor