Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Operating Systems - Advanced File Systems
1. Operating Systems
CMPSCI 377
Distributed File Systems
Emery Berger
University of Massachusetts Amherst
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
2. Distributed File Systems
Numerous drawbacks of local file systems:
Inconvenient
Administrative overhead
Single point-of-failure
Solution: distributed file systems
FS appears to be local, but data is remote
Two major implementations:
Windows
NFS (Sun’s Network File System)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
3. Complications
Distributed file systems add complexity
& many design tradeoffs
Naming – absolute vs. relative (to server)
Remote access vs. caching
Stateless or stateful server
Single image or replication
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
4. Naming & Transparency
Issues
How are files named?
Do filenames reveal location?
Do filenames change if file moves?
Do filenames change if user moves?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
5. Location naming
Location transparency:
filename does not reveal
physical storage location
Normal in Unix
Compare to Windows - C:foobar
Provides location independence:
no change if file’s storage location changes
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
6. Windows: Absolute Names
lokihomeemery
machine nameremote pathname
Advantages: Disadvantages:
Easy to find fully User must know
specified filename complete name –
local & remote
Easy to add & delete
different
new names
Location dependent
No global state
(cannot move file)
Scales easily
Makes sharing harder
Not fault-tolerant
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
7. NFS: Relative Names
/nfs/sting/users1/emery
Advantages: Disadvantages:
Location Admin
transparent overhead
Remote name
can change
across reboots
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
8. NFS: Relative Names
/courses/cs300/cs377
Implemented via mount points
one level of indirection!
Each host: local names ! remote locations
Mount table (/etc/fstab)
<remote pathname @ machine, local pathname>
% cat /etc/fstab
elsrv4:/courses /courses nfs intr,hard,rw 0 0
elsrv4:/courses/cs100_200 /courses/cs100_200 nfs intr,hard,rw 0 0
elsrv4:/courses/cs300 /courses/cs300 nfs intr,hard,rw 0 0
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
9. NFS Example
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
10. URLs Viewed as File System
Uniform Resource Locator names
increasingly standard way to access data
protocol://machine/path/to/file
Good? Bad?
Looks like Windows… same?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
11. Distributed File Systems: Issues
Naming & transparency
Remote file access & caching
Server with state or without
Replication
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
12. Remote File Access & Caching
Can access files two ways
Remotely: returns results using RPC
Locally: transfer part of file = caching
Caching issues:
Performance: Where & when to cache file
blocks?
Correctness:
When to propagate updates back to remote file?
What happens when multiple clients cache same file?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
13. Remote File Caching
Local disk:
Reduces access time (compared to remote)
Safe if node fails
Difficult to keep copy consistent with remote file
–
Requires client to have disk (…)
–
Local memory:
Quick
Works without disks
Difficult to keep copy consistent with remote file
–
Smaller cache size
–
Not fault-tolerant
–
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
14. Cache Update Policies
Write-through: always write to remote disk
Reliable
Low-performance = remote service for all writes
–
Write-back: write only to cache
Write to disk on evictions, periodic sync
Quick
Reduces network traffic (n writes to same block)
User machine crashes ) data loss
–
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
15. Cache Consistency
Client-initiated consistency:
client contacts server and checks consistency
every access
at given intervals
only upon opening a file
Server-initiated consistency:
server detects potential conflicts,
invalidates caches
Server needs to know:
which clients have cached which parts of which files, plus
which clients are readers & which are writers
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
16. Case Study: Network File System
NFS: standard for distributed UNIX file access
Designed to run on LANs
Nodes: both servers & clients
Servers have no state = no info about clients
Uses mount protocol to make global name local
/etc/exports
local names server willing to export
/etc/fstab
global names that local nodes import
global name must be in /etc/exports on server
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
17. NFS Implementation
Set of RPC operations for remote file access:
Directory search, reading directory entries
Manipulating links & directories
Accessing file attributes
Reading/writing files
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
18. NFS Implementation
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
19. The End
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
20. Global Name Space
Single name space:
Examples:
AFS (CMU’s Andrew File System)
Sprite (Berkeley)
No matter which node you are on,
filenames remain the same
Client: gets filename structure from server(s)
When users access files, server sends copies
to workstation, where they are cached
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
21. Global Name Space: Pros & Cons
Advantages:
Naming – consistent
Ensures all files are same regardless of where you
login
Late binding of names ) moving them is easier
Disadvantages:
Difficult for OS to keep files consistent (caching)
Global name space may limit flexibility
Performance issues
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21