3. Synchronization
⢠Is an issue only if files are shared
⢠Sharing in a distributed system is often
necessary, and at the same time can
affect performance in various ways.
⢠In the following discussion we assume file
sharing takes place in the absence of
process-implemented synchronization
operations such as mutual exclusion.
4. UNIX File Semantics
⢠In a single-processor (or SMP) system, any
file read operation returns the result of the
most recent write operation.
⢠Even if two writes occur very close
together, the next read returns the result of
the last write.
⢠It is as if all reads and writes are time-
stamped from the same clock. Operation
order is based on strict time ordering.
5. UNIX Semantics in DFS
⢠Possible to (almost) achieve IFâŚ
â There is only one server
â There is NO caching at the client
⢠In this case every read and write goes directly to
the server, which processes them in sequential
order.
â Even so, the order of simultaneous writes depends on
network transmission times.
⢠Any file read operation returns the result of the
most recent write operation as seen by the
server.
6. Caching and UNIX Semantics
⢠Single-server + no client caching leads to
poor performance, so most file systems
allow users to make local copies of files
(or file blocks) that are currently in use.
⢠Now UNIX semantics are problematic: a
write executed on a local copy only will not
be seen by another client that reads the
file from the server, or from other clients
that have the file cached.
7. Write-Through
⢠A partial solution is to require all changes
to local copies to be immediately written to
the server. Now a new user will see all the
changes.
â Still inefficient â caching is no longer as useful
â Not a total solution: what happens when two
users have the same file cached?
8. Consistency Models
⢠Recall discussion of consistency models in
Chapter 7
⢠Realistically, strict consistency canât be
achieved without synchronization
techniques such as transactions or locks
⢠Here we consider what the file system can
do in the absence of user-enabled
methods.
9. Session Semantics
⢠Instead of trying to implement UNIX
semantics where it is not possible, define
a new semantic:
â Local changes to a file are not made
permanent until the file is closed. In the
meantime, if another user opens the file, she
gets the original version.
â This approach is common in DFSâs.
10. Simultaneous Caching
⢠What if two users concurrently cache and
modify the same file? How do we
determine the ânewâ state of the file?
⢠Most likely option:
â The most recently closed file becomes the
new âofficialâ version
11. File foo
X = 10
Open file Write x = x+6
foo Close foo
to foo
Process 1
Process 2
Open Write x=4*x Close foo
foo to foo
Using session semantics, what is the final value of x in file foo after both
processes close the file?
Assume the horizontal lines represent real time & close operations are
seen at the server in the order shown
12. Immutable Files
⢠The only operations on a file are,
effectively, create, read, and replace.
â Once a file is created it can be read but not
changed.
â A new file (incorporating changes to a current
file) can be created and placed in the
directory instead of the original version.
⢠If several users try to replace an existing
file at the same time, one is chosen: either
the last to close, or non-deterministically.
13. Review: File System Semantics
⢠UNIX semantics ⢠Every file operation is instantly
visible to all processes
⢠Session semantics ⢠No changes are visible until
the file is closed.
⢠Immutable files ⢠No updates are possible; files
can only be replaced
14. Transaction Semantics
⢠Transactions are a way of grouping
several file operations together and
ensuring that they are either all executed
or none is executed.
â We say they are atomic.
⢠The transaction system is responsible for
ensuring that all of the operations are
carried out in order, without any
interference from concurrent transactions.
15. The Transaction Model
⢠Transaction: a set of operations which
must be executed entirely, or not at all.
⢠Processes in a transaction can fail at
random
â Failure causes: hardware or software
problems, network problems, lost messages,
etc.
⢠Transactions will either commit or abort:
â Commit => successful completion (All)
â Abort => partial results are undone (Nothing)
16. Transaction Model
⢠Transactions are delimited by two special
primitives:
Begin_transaction // or something similar
transaction operations
(read, write, open, close, etc.)
End_transaction
⢠If the transaction successfully reaches the end
statement, it âcommitsâ and all changes become
permanent; otherwise it aborts.
17. ACID Properties of Transactions
⢠Atomic: either all or none of the operations in a
transaction are performed
⢠Consistent: the transaction doesnât affect system
invariants; e.g., no money âlostâ in a banking
system
⢠Isolated (serializable): one transaction canât
affect others until it completes
⢠Durability: changes made by a committed
transaction are permanent, even if the process
or server fails.
18. Atomicity
⢠An atomic action is one that appears to be
âindivisible and instantaneousâ to the rest
of the system. For example, machine
language instructions.
⢠Transactions support the execution of
multiple instructions as if they were a
single atomic instruction.
19. Consistent
⢠A state is consistent if invariants hold
⢠An invariant is a predicate which states
a condition that must be true.
⢠Invariants for the airline ticket example:
â seatsLeft = seatsTotal â seatsSold
â seatsLeft >=0
⢠In the bank case (simplified)
â balancefinal = balanceoriginal â withdrawals +
deposits
20. Isolated
⢠No other transaction will see the
intermediate results of a transaction.
⢠Concurrent transactions have the same
effect on the database as if they had run
serially. Notice the similarity to critical
sections, which do run serially.
⢠This characteristic is enforced through
special concurrency control measures.
21. AD Properties
⢠ACID is a commonly used term, but
somewhat redundant.
⢠Transactions that execute atomically will
be consistent and isolated.
⢠Atomicity and durability capture the
essential qualities.
22. Semantics of File Sharing in
Distributed Systems
⢠UNIX semantics ⢠Every file operation is instantly
visible to all processes
⢠Session semantics ⢠No changes are visible until
the file is closed.
⢠Immutable files ⢠No updates are possible; files
can only be replaced
⢠Transactions ⢠All changes occur and are
visible atomically â or not at all
23. File Locking
⢠UNIX file semantics are not possible in
DFS
⢠Session semantics and immutable files do
not always support the kind of sharing
processes need.
⢠Transactions have a heavy overhead.
⢠Thus some additional form of
synchronization is desirable to enable the
server to enforce mutual exclusion on
writes.
24. Locking in NFS
⢠Early versions of NFS (through V3) were
stateless, and so could not implement
locks.
⢠An add-on, NLM (Network Lock Manager)
worked in the NFS environment to enforce
advisory locking
â If one process has locked a byte sequence,
any other process requesting a lock on that
sequence will be denied.
25. File Locking in NFSv4
⢠NFSv4 added a similar locking discipline to the
basic protocols.
⢠Lock managers in NSF, as in other file systems,
are based on the centralized scheme discussed
in Chapter 6
â Client requests lock
â Lock manager grants lock (if it is free)
â Client releases lock (or it expires after a time)
⢠In NSF, if a client requests a lock which cannot
be granted, the client is not blocked â must try
again later.
26. Denied Requests
⢠If a clientâs request for a lock is denied, it
receives an error message.
â Poll the server later for lock availability
⢠Clients can request to be put on a FIFO queue;
when a lock is released it is reserved for the first
process on the queue; if that process polls within
a certain amount of time it gets the lock.
â How is this different from the centralized mutual
exclusion algorithm in the textbook?
27. File locking in NFS
⢠Two types of locks:
â Reader locks, which can be held
simultaneously,
â Writer locks, which guarantee exclusive
access.
⢠The lock operation is applied to
consecutive byte sequences in the file,
rather than to the whole file.
28. NFSv4 Lock Related Operations
Operation Description
⢠Lock ⢠Create a lock for a range of
bytes
⢠Lockt ⢠Test whether a conflicting lock
has been granted
⢠Locku ⢠Remove a lock from a range of
bytes
⢠Renew ⢠Renew the lease on a lock
29. Leases
⢠Locks are granted for a specific time
interval.
â What problem does this address?
⢠At the end of that interval the lock is
removed unless the client has requested
an extension.
30. Share Reservations in NFS
⢠An open request specifies the kind of
access the application requires: READ,
WRITE, BOTH
⢠It also specifies the kind of access that
should be denied other clients: NONE,
READ, WRITE, BOTH
⢠If requirements canât be met, open fails
⢠Share reservations = implicit locking
⢠Used in NFS for Windows-based systems
31. Share Reservations - Example
⢠Client tries to open a file for reading and
writing, and deny concurrent write access.
â If no other client has the file open, the request
succeeds.
â If another client has opened the file for reading
(and hasnât blocked write access), the request
succeeds
â If another client has opened the file for writing,
the request fails.
â If another client has the file open and has
denied read or write access, the request fails.
32. 11.5 Summary
⢠UNIX semantics not possible in DFS
⢠Session semantics is a common choice
⢠File locking (usually advisory) can provide
additional protection if needed.
⢠In parallel programs, mutual exclusion
techniques can be used to protect file
operations.
⢠In database systems, transactions are
used.
33. 11.6: Consistency and Replication
⢠Client-Side Caching
⢠Server-Side Replication
⢠Replication in P2P Systems
34. Introduction
⢠Replication (and caching) => multiple copies
of something
⢠Two reasons for replication:
â Reliability (protection against failure, corruption)
â Performance (size of user base, geographical
extent of system)
⢠Replication can cause inconsistency: at least
one copy is different from the rest.
35. Caching in a DFS
⢠Caching in any DFS reduces access
delays due to disk access times or
network latency.
⢠Caches can be located in the main
memory of either the server or client
and/or in the disk of the client
36. Caching in a DFS
⢠Client-side caching (memory or disk)
offers most performance benefits, but also
leads to potential inconsistencies.
⢠However, because in practice file sharing
is relatively rare, client-side caching
remains a popular way to improve
performance in a DFS.
37. Cache Consistency Measures
⢠Server-initiated consistency: server
notifies client if its data becomes stale
â e.g., another client closes its copy of the file,
which was opened for writing.
⢠Client-initiated consistency: client is
responsible for consistency of data
â e.g., client side software can periodically
check with server to see if file has been
modified.
38. Caching in NFS
⢠NFSv3 did not define a caching protocol.
â Individual implementations made decisions
⢠âStaleâ data could exist for periods ranging
from a few seconds to ½ minute
⢠NFSv4 made some improvements but
many details are still implementation
dependent.
⢠General structure of NFS cache model
follows
39. Client Side Caching in NFS
Figure 11-21.
Memory NFS
Client
Cache applica- server
tion
Disk
cache
Network
40. What Do Clients Cache?
⢠File data blocks
⢠File handles â for future reference
⢠Directories
⢠Two approaches to caching in NFS
â Caching with server control
â Caching with open delegation
41. Caching Data with Server Control
⢠The simplest approach to caching allows the
server to retain control over the file.
⢠Procedure
â Client opens file
â Data blocks are transferred to the client (by read ops)
â Client can read and write data in the cache.
â When the file closes, flush changes back to server
⢠Session semantics & NFS: the last (most recent)
process to close a file has its changes become
permanent. Changes made by processes that run
concurrently but close earlier are lost.
42. Caching with Server Control
⢠In caching with server control
â All clients on a single machine may read and write the
same cached data if they have access rights
â data remaining in the cache after a file closes doesnât
need to be removed, although changes must be sent
to server.
⢠If a new client on the same machine opens a file
after it has been closed, the client cache
manager usually must validate local cached data
with the server
â If the data is stale, replace it.
43. Caching With Open Delegation
⢠Allows a client machine to handle some
local open and close operations from
other clients on the same machine.
â Normally the server decides if a client can
open a file
⢠Delegation can improve performance by
limiting contact with the server
⢠The client machine gets a copy of the
entire file, not just certain blocks.
44. Open delegation â Examples*
⢠Suppose a client machine has opened a
file for writing, and has been delegated
rights to control the file locally.
â If another local client tries to lock the file, the
local machine can decide whether or not to
grant the lock
â If a remote client tries to lock the file (at the
server) the server will deny file access
⢠If a client has opened the file for reading,
only, local clients desiring write privileges
must still contact the server.
45. Delegation and Callbacks
⢠Server may need to âundelegateâ the file â
perhaps when another client needs to
obtain access.
⢠This can be done with a callback, which is
essentially an RPC from server to client.
⢠Callbacks require the server to maintain
state (knowledge) about clients â a reason
for NFS to be stateful.
46. Caching Attributes*
⢠Clients can cache attributes as well as data.
â (size of file, number of links, last date modified, etc.)
⢠Cached attributes are kept consistent by the
client, if at all
â No guarantee that the same file cached at two sites
will have the same attributes at both sites
⢠Attribute modifications should be written through
to the server (write through cache coherence
policy), although thereâs no requirement to do so
47. Leases*
⢠Lease: cached data is automatically
invalidated after a certain period of time.
â Applies to file attributes, file handles (mapping
of name to file handle), directories, and
sometimes data.
â When lease expires, must renew data from
server
â Helps with consistency and protects against
errors.
48. An Implementation of Leases*
⢠Data blocks have time-stamps applied by the
server that indicate when they were last
modified.
⢠When a block is cached at a client, the serverâs
time-stamp is also cached.
⢠After a period of time, the client confirms the
validity of the data
â Compare timestamp at the client to timestamp at
server
â If server timestamp is more recent, invalidate client
data
49. Coda
A Prototype Distributed File System
⢠Developed at CMU â M. Satarayanan
⢠Started in 1987 as an improvement on the
Andrew file system ( a classic research
FS)
â Andrew strongly influenced NFSv4 and some
versions of Linux
⢠Most recent version of Coda (6.9.4) was
released 1/05/2009 (
http://www.coda.cs.cmu.edu/news.html )
50. Objectives of Coda
⢠Support disconnected operation (server
goes down, laptop is disconnected from
network, etc.)
⢠Client side caching is extensive
â Uses client disk cache
⢠Replication contributes to availability, fault
tolerance, scalability
51. Caching in Coda
⢠Critical, because of Codaâs objectives
⢠Caching achieves scalability; provides
more fault tolerance for the client in case it
is disconnected from the server.
⢠When a client opens a file, the entire file is
downloaded. This is true for reads and
writes.
52. Concurrent Access
⢠In Coda, many clients may have a file
open for reading, but only one for writing.
â Multiple readers and single writer may exist
concurrently
â In NFS and most other file systems, multiple
readers and multiple writers can exist
concurrently unless locks are used to prohibit
sharing.
53. Callbacks/Server Initiated Cache
Consistency
⢠A Coda callback is an agreement between
the server and a client. Server agrees to
notify client when a file has been modified by
another client, closed, and written back to
server.
⢠At this time, the client may purge the file from
its cache, but it may also continue reading the
outdated copy.
⢠This is a blend of session and transaction
semantics.
54. Coda Callbacks
⢠Callback promise: serverâs commitment to
notify client when file changes
⢠Callback break: notice from server that the
clientâs file is stale; called a âbreakâ
because it terminates the agreement.
There will be no further callbacks unless
the client renews it.
55. Figure 11-23, page 523
⢠Local copies of files can be used as long
as the client still has an outstanding
callback promise
â No other client has closed a modified file.
56. client 1 cache
server
client 2 cache
Suppose clients 1 & 2 have cached the same file.
Client 1 modifies the file
How/when does client2 know?
What role, if any, does the server have?
Are Coda and NFS different in this respect?
57. 11.6.2: Server-Side Replication
⢠Caching: replication at the client side.
â Initiated implicitly by client request
â Cached data is temporary
â Unit of caching = a file, or less (usually)
â Purpose: improved performance
⢠Server replication
â Mainly for fault tolerance & availability
â May actually degrade performance (overhead)
â Replicated data is permanent
58. Caching & Replication in Coda
⢠Unit of replication = volume (group of
related files)
⢠Each volume is stored on several servers,
its Volume Storage Group (VSG)
⢠Available Volume Storage Group (AVSG)
is the set of servers a client can actually
reach
⢠Contact one server to get permission to R/
W, contact all when closing an updated
file.
59. Server
S1 Server
S3
Server Broken
S2 network
Client Client
A B
Open(f) Open(f)
Figure 11-24. Two clients with a different AVSG for the same file
60. Writing in Disconnected Systems
⢠Each file has a Coda version vector (CVV),
analogous to vector timestamps, one
component per server. Starts at (1, 1, 1)
⢠Update local component after a file is
updated.
⢠As long as all servers get all updates, all
timestamps will be equal
61. Detecting Inconsistencies
⢠In the previous example, both A and B will
be allowed to open a file for writing.
⢠When A closes, it will update S1 and S2, but
not S3; B will update S3, but not S1, S2.
⢠The timestamp at S1 and S2 will be [2, 2, 1].
⢠The timestamp at S3 will be [1, 1, 2].
⢠It is easy to detect the inconsistency, but
knowing how to resolve them is application
dependent.
62. Replication in P2P Systems
⢠In P2P systems replication is more
important because
â P2P members are less reliable â may leave
the system or remove files
â Load balance is important since there are no
designated servers
⢠File usage in P2P is different: most files
are read only, updates consist of adding
new files, so consistency is less of an
issue.
63. Unstructured P2P Systems
(each node knows n neighbors)
⢠Look-up = search (in structured systems,
lookup is directed by some algorithm)
⢠Replication speeds up the process
⢠How to allocate files to nodes (it may not
be possible to force a node to store files)
â Uniformly distribute n copies across network
â Allocate more replicas for popular files
â Users who download files are responsible for
sharing them with others (as in BitTorrent)
64. Structured P2P Systems
⢠Replication is used primarily for load
balance
⢠Possible approaches:
â Store a replica at each node in the search
path (concentrates replicas near the prime
copy, but may unbalance some nodes)
â Store replicas at nodes that request a file,
store pointers to it at nodes along the way.
65. 11.7: Fault Tolerance in DFS*
⢠Review of Fault Tolerance
⢠Handling Byzantine Failures
⢠High Availability in P2P systems
66. Basic Concepts - Review
⢠Distributed systems may experience partial failure
⢠Build systems to automatically recover from
crashes.
⢠Continue to operate normally while failures are
being repaired; i.e., be fault tolerant.
⢠Fault tolerant systems exhibit dependabilty.
â Availability: the system is immediately ready to use
â Reliability: the system can run continuously without
failing.
⢠(remember availability/reliability example)
â Safety: system failure doesnât have disastrous
consequences
â Maintainability: easy to repair
67. Failure Models
⢠Failure may be due to an error at any place in
the system:
â The server crashes
â The network goes down
â A disk crashes
â Security violations occur
⢠Crash failure, omission failure, Byzantine
failure:
â Incorrect, but undetectable;
â malicious servers produce deliberately wrong results,
â ...
68. Handling Byzantine Failures in
Distributed File Systems
⢠Replication handles many errors in DFS
but Byzantine errors are harder to solve.
⢠Text presents an algorithm by Castro and
Liskov that works as long as no more than
1/3 of the nodes is faulty at any moment.
⢠Clients must get the same answer from
k+1 servers (in a system with 3k +1) to be
sure the answer is correct.
69. Availability in P2P Systems
⢠Possible approaches
â Replication (although must be at very high
levels due to unreliability of nodes)
â Erasure coding: divides a file into m
fragments, recodes them into n > m fragments
such that any set of m fragments can be used
to reconstruct the entire file. Distribute
fragments, rather than entire file replicas
⢠Requires less redundancy than full replication.