2. Google File System
Google File System is a proprietary distributed file system
developed by Google to provide efficient, reliable access to data
using large clusters of commodity hardware.
A new version of Google File System code named Colossuswas
released in 2010.
2
3. Google File System
Google organized the GFS into clusters of computers. A cluster is
simply a network of computers. Each cluster might contain
hundreds or even thousands of machines. Within GFS clusters there
are three kinds of entities: clients, master servers and chunkservers.
3
4. Design consideration
□ 1. Built from cheap commodity hardware
□ 2. Expect large files: 100MB to many GB
□ 3. Support large streaming reads and small random
reads
4. Support large, sequential file appends.
5. Sustain high bandwidth by writing data in bulk
4
5. Interface
1.Interface correspond standard file
system hierarchical directories and path
names
2.Usual operations
create, delete, open, close, read, and write
3.Multiple clients to append data to the same file
concurrently.
6. Google File System (GFS)
▰ Google File System (GFS) is a scalable
distributed file system (DFS) created
byGoogle
▰ GFS provides fault tolerance, reliability,
scalability, availability and performance to
large networks and connected nodes
6
8. Write & control operation
8
GFS Write & control operation
1.The client requests key from the master
2.The master returns the file location information
3.The client sends write requests and pushes the data
to all the replicas.
4.The chunkservers reply acknowledge to the client.
5.The client sends a write execution request to the
primary chunkserver.
6.The primary chunkserver forwards the write request.
All the secondary chunkservers reply to the primary.
7.The primay chunkserver replies the result to the client
10. ChunkServer
▰Reduce interaction between client and master
▰Chunk servers store data as Linux files on local disks.
▰Client can perform many operations on a given chunk
▰Reduces network overhead by keeping persistent TCP connection
▰Reduce size of metadata stored on the master
▰Metadata is that provides information about other data.
10
11. Conclusion
▰GFS is a distributed file system that support large-scale
data processing workloads on commodity hardware
▰GFS has different points in the design space
▰GFS provides fault tolerance
○ Replicating data
○ Fast and automatic recovery
○ Chunk replication 11