SlideShare uma empresa Scribd logo
1 de 21
Avinash Kumar
BE Computer-2
  Roll No-40
Contents
 Introduction to GFS
 System Architecture
 System Features
 Working of GFS
 Latest advancement
 Conclusion
 Questions
Introduction
 More than 15,000 commodity-class PC's.
 Multiple clusters distributed worldwide.
 Thousands of queries served per second.
 One query reads 100's of MB of data.
 One query consumes 10's of billions of CPU
  cycles.
 Google stores dozens of copies of the entire Web!

 Conclusion: Need large, distributed, highly
          fault tolerant file system.
System Architecture
 A GFS cluster consists of a single master and multiple
 chunk-servers and is accessed by multiple clients.
Large Chunk




 GFS uses large chunk: 64MB (1G = 1024 MB = 16 chunks)
    Stored as a plain Linux file, which will be lazily extended up to 64MB.
 Opt to many read and write on a given chunk
    Reduces network overhead by keeping a connection to the chunk
     server.
    See also Map-Reduce, Big-Table.
Architecture (cont’d)
 Chunkserver
   Files are divided into fixed-size chunks (64 MB)
   Each chunk is identified by an immutable and globally
   unique 64 bit chunkhandle assigned by the master at the
   time of chunkcreation
   Chunkservers store chunks on local disks as Linux files
   and read or write chunk data specified by a chunkhandle
    For reliability, each chunk is replicated on multiple
   chunkservers. (default 3 replicas)

 GFS Client
   GFS client code linked into each application implements
   the file system API and communicates with the master
   and chunkservers to read or write data on behalf of the
System Metadata
 The master stores three major types of metadata:
     The file and chunk namespaces
     The mapping from files to chunks
     The locations of each chunk’s replicas
 All metadata is kept in the master’s memory
 The first two types are also kept persistent by logging
 mutations to an operation log stored on the master’s local disk
 and replicated on remote machines.
 The master does not store third type persistently. Instead, it
 asks each chunkserver about its chunks at master startup
System Features
Page Rank- Probability that a random surfer visits the site
         Citations (Back links)
     •
         How is Page Rank calculated??
     •

     PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
     where,
     PR      -> Page Rank of a page
     T1….Tn -> Pages that point to Page A (citations)
     d       -> Damping Factor (0<d<1)
     C(A) -> No. of Links going out from A
         Page Rank of a page depends on-
             Number of pages pointing to it.
             Page Rank of the page that points to it.
System Features
 Anchor Text- text associated with the link
      Association with the page the link is on
    •
    • Association with the page the link points to( unique
      to Google)
    Advantages:
    • Anchors contain more information than the pages
      themselves
    • Documents that cannot be indexed can be displayed


 Other Features:
        Proximity of location information in search for all hits
    •
        Track of visual presentation details
    •
System Anatomy
Working Of GFS
Google Query Evaluation
     Parse the query.
1.

     Convert words into wordIDs.
2.

     Seek to the start of the doclist in the short barrel for every word.
3.

     Scan through the doclists until there is a document that matches all the
4.
     search terms.

     Compute the rank of that document for the query.
5.

     If in the short barrels and at the end of any doclist, seek to the start of
6.
     the doclist in the full barrel for every word and go to step 4.

     If we are not at the end of any doclist go to step 4.
7.

Sort the documents that have matched by rank and return the top k.
Client Read
 Client sends master:
   read(file name, chunk index)
 Master’s reply:
   chunk ID, chunk version number, locations of
     replicas
 Client sends “closest” chunkserver w/replica:
    read(chunk ID, byte range)
    “Closest” determined by IP address on simple rack-
     based network topology
 Chunkserver replies with data
Client Write
 Some chunkserver is primary for each chunk
   Master grants lease to primary (typically for 60 sec.)
   Leases renewed using periodic heartbeat messages
     between master and chunkservers
 Client asks server for primary and secondary replicas for
  each chunk
 Client sends data to replicas in daisy chain
    Pipelined: each replica forwards as it receives
    Takes advantage of full-duplex Ethernet links
Client Write (2)
 All replicas acknowledge data write to client
 Client sends write request to primary
 Primary assigns serial number to write
  request, providing ordering
 Primary forwards write request with same serial
  number to secondaries
 Secondaries all reply to primary after completing
  write
 Primary replies to client
Client Write (3)
What Happen If the Master Reboots?
 Replays log from disk
    Recovers namespace (directory) information
    Recovers file-to-chunk-ID mapping
 Asks chunkservers which chunks they hold
   Recovers chunk-ID-to-chunkserver mapping
 If chunk server has older chunk, it’s stale
    Chunk server down at lease renewal
 If chunk server has newer chunk, adopt its version
  number
    Master may have failed while granting lease
What Happen if Chunkserver Fails?

 Master notices missing heartbeats
 Master decrements count of replicas for all chunks on
  dead chunkserver
 Master re-replicates chunks missing replicas in
  background
   Highest priority for chunks missing greatest
    number of replicas
Latest Advancement
 gMail - An easily configurable email
          service with 1GB of web space.
 Blogger- A free web-based service that
            helps consumers publish on the
            web without writing code or installing
            software.
 Google “next generation corporate s/w”
             - A smaller version of the google
               software, modified for private use.
Conclusion
 Success: used actively by Google to support search service and
  other applications
    Availability and recoverability on cheap hardware
    High throughput by decoupling control and data
    Supports massive data sets and concurrent appends
 Semantics not transparent to apps
    Must verify file contents to avoid inconsistent
     regions, repeated appends (at-least-once semantics)
 Performance not good for all apps
    Assumes read-once, write-once workload (no client
     caching!)
ANY QUESTION
      ?

Mais conteúdo relacionado

Mais procurados

Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systemsViet-Trung TRAN
 
Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud ComputingJithin Parakka
 
Data storage security in cloud computing
Data storage security in cloud computingData storage security in cloud computing
Data storage security in cloud computingSonali Jain
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMJYoTHiSH o.s
 
SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...
SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...
SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...Sagar Rai
 
ELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOLELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOLShashank Rustagi
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data miningMITS Gwalior
 
google file system
google file systemgoogle file system
google file systemdiptipan
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureThanakrit Lersmethasakul
 

Mais procurados (20)

Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
 
Data storage security in cloud computing
Data storage security in cloud computingData storage security in cloud computing
Data storage security in cloud computing
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Coda file system
Coda file systemCoda file system
Coda file system
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...
SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...
SDN( Software Defined Network) and NFV(Network Function Virtualization) for I...
 
ELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOLELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOL
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Google File System
Google File SystemGoogle File System
Google File System
 
Firewall Basing
Firewall BasingFirewall Basing
Firewall Basing
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
google file system
google file systemgoogle file system
google file system
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
 

Semelhante a Google File System

Google File System
Google File SystemGoogle File System
Google File SystemDreamJobs1
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file Systemdiptipan
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraAmazon Web Services
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...confluent
 
Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019Pierre-Luc Maheu
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Testing pc’s performance
Testing pc’s performanceTesting pc’s performance
Testing pc’s performanceiteclearners
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 

Semelhante a Google File System (20)

Google File System
Google File SystemGoogle File System
Google File System
 
Qcon
QconQcon
Qcon
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
 
Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Testing pc’s performance
Testing pc’s performanceTesting pc’s performance
Testing pc’s performance
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 

Último

SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 

Último (20)

SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 

Google File System

  • 2. Contents  Introduction to GFS  System Architecture  System Features  Working of GFS  Latest advancement  Conclusion  Questions
  • 3. Introduction  More than 15,000 commodity-class PC's.  Multiple clusters distributed worldwide.  Thousands of queries served per second.  One query reads 100's of MB of data.  One query consumes 10's of billions of CPU cycles.  Google stores dozens of copies of the entire Web! Conclusion: Need large, distributed, highly fault tolerant file system.
  • 4. System Architecture A GFS cluster consists of a single master and multiple chunk-servers and is accessed by multiple clients.
  • 5. Large Chunk  GFS uses large chunk: 64MB (1G = 1024 MB = 16 chunks)  Stored as a plain Linux file, which will be lazily extended up to 64MB.  Opt to many read and write on a given chunk  Reduces network overhead by keeping a connection to the chunk server.  See also Map-Reduce, Big-Table.
  • 6. Architecture (cont’d) Chunkserver Files are divided into fixed-size chunks (64 MB) Each chunk is identified by an immutable and globally unique 64 bit chunkhandle assigned by the master at the time of chunkcreation Chunkservers store chunks on local disks as Linux files and read or write chunk data specified by a chunkhandle For reliability, each chunk is replicated on multiple chunkservers. (default 3 replicas) GFS Client GFS client code linked into each application implements the file system API and communicates with the master and chunkservers to read or write data on behalf of the
  • 7. System Metadata The master stores three major types of metadata: The file and chunk namespaces The mapping from files to chunks The locations of each chunk’s replicas All metadata is kept in the master’s memory The first two types are also kept persistent by logging mutations to an operation log stored on the master’s local disk and replicated on remote machines. The master does not store third type persistently. Instead, it asks each chunkserver about its chunks at master startup
  • 8. System Features Page Rank- Probability that a random surfer visits the site Citations (Back links) • How is Page Rank calculated?? • PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where, PR -> Page Rank of a page T1….Tn -> Pages that point to Page A (citations) d -> Damping Factor (0<d<1) C(A) -> No. of Links going out from A Page Rank of a page depends on-  Number of pages pointing to it.  Page Rank of the page that points to it.
  • 9. System Features  Anchor Text- text associated with the link Association with the page the link is on • • Association with the page the link points to( unique to Google) Advantages: • Anchors contain more information than the pages themselves • Documents that cannot be indexed can be displayed  Other Features: Proximity of location information in search for all hits • Track of visual presentation details •
  • 12. Google Query Evaluation Parse the query. 1. Convert words into wordIDs. 2. Seek to the start of the doclist in the short barrel for every word. 3. Scan through the doclists until there is a document that matches all the 4. search terms. Compute the rank of that document for the query. 5. If in the short barrels and at the end of any doclist, seek to the start of 6. the doclist in the full barrel for every word and go to step 4. If we are not at the end of any doclist go to step 4. 7. Sort the documents that have matched by rank and return the top k.
  • 13. Client Read  Client sends master:  read(file name, chunk index)  Master’s reply:  chunk ID, chunk version number, locations of replicas  Client sends “closest” chunkserver w/replica:  read(chunk ID, byte range)  “Closest” determined by IP address on simple rack- based network topology  Chunkserver replies with data
  • 14. Client Write  Some chunkserver is primary for each chunk  Master grants lease to primary (typically for 60 sec.)  Leases renewed using periodic heartbeat messages between master and chunkservers  Client asks server for primary and secondary replicas for each chunk  Client sends data to replicas in daisy chain  Pipelined: each replica forwards as it receives  Takes advantage of full-duplex Ethernet links
  • 15. Client Write (2)  All replicas acknowledge data write to client  Client sends write request to primary  Primary assigns serial number to write request, providing ordering  Primary forwards write request with same serial number to secondaries  Secondaries all reply to primary after completing write  Primary replies to client
  • 17. What Happen If the Master Reboots?  Replays log from disk  Recovers namespace (directory) information  Recovers file-to-chunk-ID mapping  Asks chunkservers which chunks they hold  Recovers chunk-ID-to-chunkserver mapping  If chunk server has older chunk, it’s stale  Chunk server down at lease renewal  If chunk server has newer chunk, adopt its version number  Master may have failed while granting lease
  • 18. What Happen if Chunkserver Fails?  Master notices missing heartbeats  Master decrements count of replicas for all chunks on dead chunkserver  Master re-replicates chunks missing replicas in background  Highest priority for chunks missing greatest number of replicas
  • 19. Latest Advancement  gMail - An easily configurable email service with 1GB of web space.  Blogger- A free web-based service that helps consumers publish on the web without writing code or installing software.  Google “next generation corporate s/w” - A smaller version of the google software, modified for private use.
  • 20. Conclusion  Success: used actively by Google to support search service and other applications  Availability and recoverability on cheap hardware  High throughput by decoupling control and data  Supports massive data sets and concurrent appends  Semantics not transparent to apps  Must verify file contents to avoid inconsistent regions, repeated appends (at-least-once semantics)  Performance not good for all apps  Assumes read-once, write-once workload (no client caching!)