SlideShare uma empresa Scribd logo
1 de 20
GOOGLE FILE SYSTEM
INTRODUCTION
Designed by Sanjay Ghemawat , Howard Gobioff and Shun-Tak

Leung of Google in 2002-03.
Provides fault tolerance, serving large number of clients with
high aggregate performance.
The field of Google is beyond the searching.
Google store the data in more than 15 thousands commodity
hardware.
Handles the exceptions of Google and other Google specific
challenges in their distributed file system.
DESIGN OVERVIEW
Assumptions
From many inexpensive commodity components that often

fail.
Stores a modest number of large files.
Workloads consist of large streaming reads and small

random reads.
Workloads also have many large, sequential writes that
append data to files.
Efficiently implement well-defined semantics for multiple
clients.
High sustained bandwidth is more important than low latency.
GOOGLE FILE SYSTEM ARCHITECTURE
GFS cluster consists of a single master and multiple
chunkservers.
The basic analogy of GFS is master , client , chunkservers.
Files are divided into fixed-size chunks.
Chunkservers store chunks on local disks as Linux files.
Master maintains all file system metadata.
Includes the namespace, access control information, the
mapping from files to chunks, and the current locations of
chunks.
Clients interact with the master for metadata operations.
Chunkservers need not cache file data .
Chunk
Similar to the concept of block in file systems.
Compared to file systems, the size of chunk is 64 MB.
Less chunks and less metadata for chunks in the master.
Problem in this chunk size is developing a hotspot.
Property of chunk is chunks are stored in chunkserver as
file, chunk handle, i.e., chunk file name.

Metadata
Master stores three major types of metadata: the file and
chunk namespaces, the mapping from files to chunks, and
the location of each chunk’s replicas.
First two types are kept persistent to an operation log stored
on the master’s local disk.
Metadata is stored in memory, master operations are fast.

Easy and efficient for the master to periodically scan .
Periodic scanning is used to implement chunk garbage
collection, re-replication and chunk migration .

Master
Single process ,running on a separate machine that stores
all metadata.
Clients contact master to get the metadata to contact the
chunkservers.
SYSTEM INTERACTION
Read Algorithm
1. Application originates the read request

2. GFS client translates the request form (filename, byte range) -> (filename,
chunk index), and sends it to master
3. Master responds with chunk handle and replica locations (i.e. chunkservers
where the replicas are stored)
4. Client picks a location and sends the (chunk handle, byte range) request to the

location
5. Chunkserver sends requested data to the client
6. Client forwards the data to the application

Write Algorithm
1. Application originates the request
2. GFS client translates request from (filename, data) -> (filename, chunk index),

and sends it to master
3. Master responds with chunk handle and (primary + secondary) replica
locations
4. Client pushes write data to all locations. Data is stored in chunkservers’
internal buffers
5. Client sends write command to primary

6. Primary determines serial order for data instances stored in its buffer and
writes the instances in that order to the chunk
7. Primary sends the serial order to the secondaries and tells them to perform the
write

8. Secondaries respond to the primary
9. Primary responds back to the client
Record Append Algorithm
1. Application originates record append request.
2. GFS client translates requests and sends it to master.
3. Master responds with chunk handle and (primary + secondary) replica locations.
4. Client pushes write data to all replicas of the last chunk of the file.
5. Primary checks if record fits in specified chunk.
6. If record doesn’t fit, then the primary:
Pads the chunk
Tell secondaries to do the same

And informs the client
Client then retries the append with the next chunk
7. If record fits, then the primary:
Appends the record
Tells secondaries to write data at exact offset
Receives responses from secondaries
And sends final response to the client
MASTER OPERATION
Name space management and locking
Multiple operations are to be active and use locks over regions of the

namespace.
GFS does not have a per-directory data structure.
GFS logically represents its namespace as a lookup table.
Each master operation acquires a set of locks before it runs.

Replica placement
A GFS cluster is highly distributed.
The chunk replica placement policy serves , maximize data reliability and
availability, and maximize network bandwidth utilization.

Chunk replicas are also spread across racks.
Creation , Re-replication and Balancing Chunks
Factors for choosing where to place the initially empty replicas:
(1)We want to place new replicas on chunkservers with below-average disksp
ace utilization.
(2) We want to limit the number of “recent” creations on each chunkserver.
(3)Spread replicas of a chunk across racks.
master re-replicates a chunk.
Chunk that needs to be rereplicated is prioritized based on how far it is from its
replication goal.
Finally, the master rebalances replicas periodically.
GARBAGE COLLECTION
 Garbage collection at both the file and chunk levels.
 Deleted by the application, the master logs the deletion

immediately.
 File is just renamed to a hidden name .
 The file can be read under the new, special name and can be

undeleted.
 Memory metadata is erased.
FAULT TOLERANCE
High Availability
Fast Recovery
Chunk Replication
Master Replication

Data Integrity
Chunkserver uses checksumming.
Broken up into 64 KB blocks.
CHALLENGES
 Storage size.
 Bottle neck for the clients.
 Time.
CONCLUSION
Supporting large-scale data processing.
Provides fault tolerance.
Tolerate chunkserver failures.
Delivers high throughput.
Storage platform for research and development.
THANK YOU
QUESTIONS

Mais conteúdo relacionado

Mais procurados

cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualizationDr.Neeraj Kumar Pandey
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Publish subscribe model overview
Publish subscribe model overviewPublish subscribe model overview
Publish subscribe model overviewIshraq Al Fataftah
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software conceptsPrajakta Rane
 
Cloud adoption and rudiments
Cloud  adoption and rudimentsCloud  adoption and rudiments
Cloud adoption and rudimentsgaurav jain
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualizationGokulnath S
 
Cloud Computing Security Challenges
Cloud Computing Security ChallengesCloud Computing Security Challenges
Cloud Computing Security ChallengesYateesh Yadav
 
Distributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationDistributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationMNM Jain Engineering College
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architectureAbDul ThaYyal
 
Content addressable network(can)
Content addressable network(can)Content addressable network(can)
Content addressable network(can)Amit Dahal
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systemsViet-Trung TRAN
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dosvanamali_vanu
 
Eucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebulaEucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebulaAmar Myana
 

Mais procurados (20)

cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualization
 
Google File System
Google File SystemGoogle File System
Google File System
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Publish subscribe model overview
Publish subscribe model overviewPublish subscribe model overview
Publish subscribe model overview
 
3. challenges
3. challenges3. challenges
3. challenges
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts
 
Cloud adoption and rudiments
Cloud  adoption and rudimentsCloud  adoption and rudiments
Cloud adoption and rudiments
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualization
 
Underlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computingUnderlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computing
 
Cloud Computing Security Challenges
Cloud Computing Security ChallengesCloud Computing Security Challenges
Cloud Computing Security Challenges
 
Distributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationDistributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communication
 
Cloud computing protocol
Cloud computing protocolCloud computing protocol
Cloud computing protocol
 
Characteristics of cloud computing
Characteristics of cloud computingCharacteristics of cloud computing
Characteristics of cloud computing
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architecture
 
Content addressable network(can)
Content addressable network(can)Content addressable network(can)
Content addressable network(can)
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
 
Eucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebulaEucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebula
 
Distributed information system
Distributed information systemDistributed information system
Distributed information system
 

Semelhante a GOOGLE FILE SYSTEM

advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file Systemdiptipan
 
Google File System
Google File SystemGoogle File System
Google File SystemDreamJobs1
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file systemLalit Rastogi
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)Sri Prasanna
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systemstugrulh
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptxShimoFcis
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 

Semelhante a GOOGLE FILE SYSTEM (20)

advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Google File System
Google File SystemGoogle File System
Google File System
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
 
Google file system
Google file systemGoogle file system
Google file system
 
Lalit
LalitLalit
Lalit
 
Google file system
Google file systemGoogle file system
Google file system
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
Lec3 Dfs
Lec3 DfsLec3 Dfs
Lec3 Dfs
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptx
 
Kosmos Filesystem
Kosmos FilesystemKosmos Filesystem
Kosmos Filesystem
 
Hadoop
HadoopHadoop
Hadoop
 
Gfs final
Gfs finalGfs final
Gfs final
 
tittle
tittletittle
tittle
 
Google
GoogleGoogle
Google
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
Gfs
GfsGfs
Gfs
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 

Último

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Último (20)

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

GOOGLE FILE SYSTEM

  • 2. INTRODUCTION Designed by Sanjay Ghemawat , Howard Gobioff and Shun-Tak Leung of Google in 2002-03. Provides fault tolerance, serving large number of clients with high aggregate performance. The field of Google is beyond the searching. Google store the data in more than 15 thousands commodity hardware. Handles the exceptions of Google and other Google specific challenges in their distributed file system.
  • 3. DESIGN OVERVIEW Assumptions From many inexpensive commodity components that often fail. Stores a modest number of large files. Workloads consist of large streaming reads and small random reads. Workloads also have many large, sequential writes that append data to files. Efficiently implement well-defined semantics for multiple clients. High sustained bandwidth is more important than low latency.
  • 4. GOOGLE FILE SYSTEM ARCHITECTURE GFS cluster consists of a single master and multiple chunkservers. The basic analogy of GFS is master , client , chunkservers.
  • 5. Files are divided into fixed-size chunks. Chunkservers store chunks on local disks as Linux files. Master maintains all file system metadata. Includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. Clients interact with the master for metadata operations. Chunkservers need not cache file data .
  • 6. Chunk Similar to the concept of block in file systems. Compared to file systems, the size of chunk is 64 MB. Less chunks and less metadata for chunks in the master. Problem in this chunk size is developing a hotspot. Property of chunk is chunks are stored in chunkserver as file, chunk handle, i.e., chunk file name. Metadata Master stores three major types of metadata: the file and chunk namespaces, the mapping from files to chunks, and the location of each chunk’s replicas.
  • 7. First two types are kept persistent to an operation log stored on the master’s local disk. Metadata is stored in memory, master operations are fast. Easy and efficient for the master to periodically scan . Periodic scanning is used to implement chunk garbage collection, re-replication and chunk migration . Master Single process ,running on a separate machine that stores all metadata. Clients contact master to get the metadata to contact the chunkservers.
  • 8. SYSTEM INTERACTION Read Algorithm 1. Application originates the read request 2. GFS client translates the request form (filename, byte range) -> (filename, chunk index), and sends it to master 3. Master responds with chunk handle and replica locations (i.e. chunkservers where the replicas are stored)
  • 9. 4. Client picks a location and sends the (chunk handle, byte range) request to the location 5. Chunkserver sends requested data to the client 6. Client forwards the data to the application Write Algorithm 1. Application originates the request 2. GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to master 3. Master responds with chunk handle and (primary + secondary) replica locations
  • 10. 4. Client pushes write data to all locations. Data is stored in chunkservers’ internal buffers
  • 11. 5. Client sends write command to primary 6. Primary determines serial order for data instances stored in its buffer and writes the instances in that order to the chunk 7. Primary sends the serial order to the secondaries and tells them to perform the write 8. Secondaries respond to the primary 9. Primary responds back to the client
  • 12. Record Append Algorithm 1. Application originates record append request. 2. GFS client translates requests and sends it to master. 3. Master responds with chunk handle and (primary + secondary) replica locations. 4. Client pushes write data to all replicas of the last chunk of the file. 5. Primary checks if record fits in specified chunk. 6. If record doesn’t fit, then the primary: Pads the chunk Tell secondaries to do the same And informs the client Client then retries the append with the next chunk 7. If record fits, then the primary: Appends the record Tells secondaries to write data at exact offset Receives responses from secondaries And sends final response to the client
  • 13. MASTER OPERATION Name space management and locking Multiple operations are to be active and use locks over regions of the namespace. GFS does not have a per-directory data structure. GFS logically represents its namespace as a lookup table. Each master operation acquires a set of locks before it runs. Replica placement A GFS cluster is highly distributed. The chunk replica placement policy serves , maximize data reliability and availability, and maximize network bandwidth utilization. Chunk replicas are also spread across racks.
  • 14. Creation , Re-replication and Balancing Chunks Factors for choosing where to place the initially empty replicas: (1)We want to place new replicas on chunkservers with below-average disksp ace utilization. (2) We want to limit the number of “recent” creations on each chunkserver. (3)Spread replicas of a chunk across racks. master re-replicates a chunk. Chunk that needs to be rereplicated is prioritized based on how far it is from its replication goal. Finally, the master rebalances replicas periodically.
  • 15. GARBAGE COLLECTION  Garbage collection at both the file and chunk levels.  Deleted by the application, the master logs the deletion immediately.  File is just renamed to a hidden name .  The file can be read under the new, special name and can be undeleted.  Memory metadata is erased.
  • 16. FAULT TOLERANCE High Availability Fast Recovery Chunk Replication Master Replication Data Integrity Chunkserver uses checksumming. Broken up into 64 KB blocks.
  • 17. CHALLENGES  Storage size.  Bottle neck for the clients.  Time.
  • 18. CONCLUSION Supporting large-scale data processing. Provides fault tolerance. Tolerate chunkserver failures. Delivers high throughput. Storage platform for research and development.

Notas do Editor

  1. Lpppp;pp
  2. Lpppp;pp