A20 dfs2

Distributed File Systems

Synchronization – 11.5
Consistency and Replication - 11.6
Fault Tolerance – 11.7

11.5: Synchronization
• File System Semantics
• File Locking

Synchronization
• Is an issue only if files are shared
• Sharing in a distributed system is often
necessary, and at the same time can
affect performance in various ways.
• In the following discussion we assume file
sharing takes place in the absence of
process-implemented synchronization
operations such as mutual exclusion.

UNIX File Semantics
• In a single-processor (or SMP) system, any
file read operation returns the result of the
most recent write operation.
• Even if two writes occur very close
together, the next read returns the result of
the last write.
• It is as if all reads and writes are time-
stamped from the same clock. Operation
order is based on strict time ordering.

UNIX Semantics in DFS
• Possible to (almost) achieve IF…
– There is only one server
– There is NO caching at the client
• In this case every read and write goes directly to
the server, which processes them in sequential
order.
– Even so, the order of simultaneous writes depends on
network transmission times.
• Any file read operation returns the result of the
most recent write operation as seen by the
server.

Caching and UNIX Semantics
• Single-server + no client caching leads to
poor performance, so most file systems
allow users to make local copies of files
(or file blocks) that are currently in use.
• Now UNIX semantics are problematic: a
write executed on a local copy only will not
be seen by another client that reads the
file from the server, or from other clients
that have the file cached.

Write-Through
• A partial solution is to require all changes
to local copies to be immediately written to
the server. Now a new user will see all the
changes.
– Still inefficient – caching is no longer as useful
– Not a total solution: what happens when two
users have the same file cached?

Consistency Models
• Recall discussion of consistency models in
Chapter 7
• Realistically, strict consistency can’t be
achieved without synchronization
techniques such as transactions or locks
• Here we consider what the file system can
do in the absence of user-enabled
methods.

Session Semantics
• Instead of trying to implement UNIX
semantics where it is not possible, define
a new semantic:
– Local changes to a file are not made
permanent until the file is closed. In the
meantime, if another user opens the file, she
gets the original version.
– This approach is common in DFS’s.

Simultaneous Caching
• What if two users concurrently cache and
modify the same file? How do we
determine the “new” state of the file?
• Most likely option:
– The most recently closed file becomes the
new “official” version

File foo
X = 10
Open file Write x = x+6
foo Close foo
to foo
Process 1

Process 2

Open Write x=4*x Close foo
foo to foo

Using session semantics, what is the final value of x in file foo after both
processes close the file?

Assume the horizontal lines represent real time & close operations are
seen at the server in the order shown

Immutable Files
• The only operations on a file are,
effectively, create, read, and replace.
– Once a file is created it can be read but not
changed.
– A new file (incorporating changes to a current
file) can be created and placed in the
directory instead of the original version.
• If several users try to replace an existing
file at the same time, one is chosen: either
the last to close, or non-deterministically.

Review: File System Semantics
• UNIX semantics • Every file operation is instantly
visible to all processes
• Session semantics • No changes are visible until
the file is closed.
• Immutable files • No updates are possible; files
can only be replaced

Transaction Semantics
• Transactions are a way of grouping
several file operations together and
ensuring that they are either all executed
or none is executed.
– We say they are atomic.
• The transaction system is responsible for
ensuring that all of the operations are
carried out in order, without any
interference from concurrent transactions.

The Transaction Model
• Transaction: a set of operations which
must be executed entirely, or not at all.
• Processes in a transaction can fail at
random
– Failure causes: hardware or software
problems, network problems, lost messages,
etc.
• Transactions will either commit or abort:
– Commit => successful completion (All)
– Abort => partial results are undone (Nothing)

Transaction Model
• Transactions are delimited by two special
primitives:
Begin_transaction // or something similar
transaction operations
(read, write, open, close, etc.)
End_transaction
• If the transaction successfully reaches the end
statement, it “commits” and all changes become
permanent; otherwise it aborts.

ACID Properties of Transactions
• Atomic: either all or none of the operations in a
transaction are performed
• Consistent: the transaction doesn’t affect system
invariants; e.g., no money “lost” in a banking
system
• Isolated (serializable): one transaction can’t
affect others until it completes
• Durability: changes made by a committed
transaction are permanent, even if the process
or server fails.

Atomicity
• An atomic action is one that appears to be
“indivisible and instantaneous” to the rest
of the system. For example, machine
language instructions.
• Transactions support the execution of
multiple instructions as if they were a
single atomic instruction.

Consistent
• A state is consistent if invariants hold
• An invariant is a predicate which states
a condition that must be true.
• Invariants for the airline ticket example:
– seatsLeft = seatsTotal – seatsSold
– seatsLeft >=0
• In the bank case (simplified)
– balancefinal = balanceoriginal – withdrawals +
deposits

Isolated
• No other transaction will see the
intermediate results of a transaction.
• Concurrent transactions have the same
effect on the database as if they had run
serially. Notice the similarity to critical
sections, which do run serially.
• This characteristic is enforced through
special concurrency control measures.

AD Properties
• ACID is a commonly used term, but
somewhat redundant.
• Transactions that execute atomically will
be consistent and isolated.
• Atomicity and durability capture the
essential qualities.

Semantics of File Sharing in
Distributed Systems
• UNIX semantics • Every file operation is instantly
visible to all processes
• Session semantics • No changes are visible until
the file is closed.
• Immutable files • No updates are possible; files
can only be replaced
• Transactions • All changes occur and are
visible atomically – or not at all

File Locking
• UNIX file semantics are not possible in
DFS
• Session semantics and immutable files do
not always support the kind of sharing
processes need.
• Transactions have a heavy overhead.
• Thus some additional form of
synchronization is desirable to enable the
server to enforce mutual exclusion on
writes.

Locking in NFS
• Early versions of NFS (through V3) were
stateless, and so could not implement
locks.
• An add-on, NLM (Network Lock Manager)
worked in the NFS environment to enforce
advisory locking
– If one process has locked a byte sequence,
any other process requesting a lock on that
sequence will be denied.

File Locking in NFSv4
• NFSv4 added a similar locking discipline to the
basic protocols.
• Lock managers in NSF, as in other file systems,
are based on the centralized scheme discussed
in Chapter 6
– Client requests lock
– Lock manager grants lock (if it is free)
– Client releases lock (or it expires after a time)
• In NSF, if a client requests a lock which cannot
be granted, the client is not blocked – must try
again later.

Denied Requests
• If a client’s request for a lock is denied, it
receives an error message.
– Poll the server later for lock availability
• Clients can request to be put on a FIFO queue;
when a lock is released it is reserved for the first
process on the queue; if that process polls within
a certain amount of time it gets the lock.
– How is this different from the centralized mutual
exclusion algorithm in the textbook?

File locking in NFS
• Two types of locks:
– Reader locks, which can be held
simultaneously,
– Writer locks, which guarantee exclusive
access.
• The lock operation is applied to
consecutive byte sequences in the file,
rather than to the whole file.

NFSv4 Lock Related Operations
Operation Description

• Lock • Create a lock for a range of
bytes
• Lockt • Test whether a conflicting lock
has been granted
• Locku • Remove a lock from a range of
bytes
• Renew • Renew the lease on a lock

Leases
• Locks are granted for a specific time
interval.
– What problem does this address?
• At the end of that interval the lock is
removed unless the client has requested
an extension.

Share Reservations in NFS
• An open request specifies the kind of
access the application requires: READ,
WRITE, BOTH
• It also specifies the kind of access that
should be denied other clients: NONE,
READ, WRITE, BOTH
• If requirements can’t be met, open fails
• Share reservations = implicit locking
• Used in NFS for Windows-based systems

Share Reservations - Example
• Client tries to open a file for reading and
writing, and deny concurrent write access.
– If no other client has the file open, the request
succeeds.
– If another client has opened the file for reading
(and hasn’t blocked write access), the request
succeeds
– If another client has opened the file for writing,
the request fails.
– If another client has the file open and has
denied read or write access, the request fails.

11.5 Summary
• UNIX semantics not possible in DFS
• Session semantics is a common choice
• File locking (usually advisory) can provide
additional protection if needed.
• In parallel programs, mutual exclusion
techniques can be used to protect file
operations.
• In database systems, transactions are
used.

11.6: Consistency and Replication
• Client-Side Caching
• Server-Side Replication
• Replication in P2P Systems

Introduction
• Replication (and caching) => multiple copies
of something
• Two reasons for replication:
– Reliability (protection against failure, corruption)
– Performance (size of user base, geographical
extent of system)
• Replication can cause inconsistency: at least
one copy is different from the rest.

Caching in a DFS
• Caching in any DFS reduces access
delays due to disk access times or
network latency.
• Caches can be located in the main
memory of either the server or client
and/or in the disk of the client

Caching in a DFS
• Client-side caching (memory or disk)
offers most performance benefits, but also
leads to potential inconsistencies.
• However, because in practice file sharing
is relatively rare, client-side caching
remains a popular way to improve
performance in a DFS.

Cache Consistency Measures
• Server-initiated consistency: server
notifies client if its data becomes stale
– e.g., another client closes its copy of the file,
which was opened for writing.
• Client-initiated consistency: client is
responsible for consistency of data
– e.g., client side software can periodically
check with server to see if file has been
modified.

Caching in NFS
• NFSv3 did not define a caching protocol.
– Individual implementations made decisions
• “Stale” data could exist for periods ranging
from a few seconds to ½ minute
• NFSv4 made some improvements but
many details are still implementation
dependent.
• General structure of NFS cache model
follows

Client Side Caching in NFS
Figure 11-21.

Memory NFS
Client
Cache applica- server
tion

Disk
cache

Network

What Do Clients Cache?
• File data blocks
• File handles – for future reference
• Directories

• Two approaches to caching in NFS
– Caching with server control
– Caching with open delegation

Caching Data with Server Control
• The simplest approach to caching allows the
server to retain control over the file.
• Procedure
– Client opens file
– Data blocks are transferred to the client (by read ops)
– Client can read and write data in the cache.
– When the file closes, flush changes back to server
• Session semantics & NFS: the last (most recent)
process to close a file has its changes become
permanent. Changes made by processes that run
concurrently but close earlier are lost.

Caching with Server Control
• In caching with server control
– All clients on a single machine may read and write the
same cached data if they have access rights
– data remaining in the cache after a file closes doesn’t
need to be removed, although changes must be sent
to server.
• If a new client on the same machine opens a file
after it has been closed, the client cache
manager usually must validate local cached data
with the server
– If the data is stale, replace it.

Caching With Open Delegation
• Allows a client machine to handle some
local open and close operations from
other clients on the same machine.
– Normally the server decides if a client can
open a file
• Delegation can improve performance by
limiting contact with the server
• The client machine gets a copy of the
entire file, not just certain blocks.

Open delegation – Examples*
• Suppose a client machine has opened a
file for writing, and has been delegated
rights to control the file locally.
– If another local client tries to lock the file, the
local machine can decide whether or not to
grant the lock
– If a remote client tries to lock the file (at the
server) the server will deny file access
• If a client has opened the file for reading,
only, local clients desiring write privileges
must still contact the server.

Delegation and Callbacks
• Server may need to “undelegate” the file –
perhaps when another client needs to
obtain access.
• This can be done with a callback, which is
essentially an RPC from server to client.
• Callbacks require the server to maintain
state (knowledge) about clients – a reason
for NFS to be stateful.

Caching Attributes*
• Clients can cache attributes as well as data.
– (size of file, number of links, last date modified, etc.)
• Cached attributes are kept consistent by the
client, if at all
– No guarantee that the same file cached at two sites
will have the same attributes at both sites
• Attribute modifications should be written through
to the server (write through cache coherence
policy), although there’s no requirement to do so

Leases*
• Lease: cached data is automatically
invalidated after a certain period of time.
– Applies to file attributes, file handles (mapping
of name to file handle), directories, and
sometimes data.
– When lease expires, must renew data from
server
– Helps with consistency and protects against
errors.

An Implementation of Leases*
• Data blocks have time-stamps applied by the
server that indicate when they were last
modified.
• When a block is cached at a client, the server’s
time-stamp is also cached.
• After a period of time, the client confirms the
validity of the data
– Compare timestamp at the client to timestamp at
server
– If server timestamp is more recent, invalidate client
data

Coda
A Prototype Distributed File System
• Developed at CMU – M. Satarayanan
• Started in 1987 as an improvement on the
Andrew file system ( a classic research
FS)
– Andrew strongly influenced NFSv4 and some
versions of Linux
• Most recent version of Coda (6.9.4) was
released 1/05/2009 (
http://www.coda.cs.cmu.edu/news.html )

Objectives of Coda
• Support disconnected operation (server
goes down, laptop is disconnected from
network, etc.)
• Client side caching is extensive
– Uses client disk cache
• Replication contributes to availability, fault
tolerance, scalability

Caching in Coda
• Critical, because of Coda’s objectives
• Caching achieves scalability; provides
more fault tolerance for the client in case it
is disconnected from the server.
• When a client opens a file, the entire file is
downloaded. This is true for reads and
writes.

Concurrent Access
• In Coda, many clients may have a file
open for reading, but only one for writing.
– Multiple readers and single writer may exist
concurrently
– In NFS and most other file systems, multiple
readers and multiple writers can exist
concurrently unless locks are used to prohibit
sharing.

Callbacks/Server Initiated Cache
Consistency
• A Coda callback is an agreement between
the server and a client. Server agrees to
notify client when a file has been modified by
another client, closed, and written back to
server.
• At this time, the client may purge the file from
its cache, but it may also continue reading the
outdated copy.
• This is a blend of session and transaction
semantics.

Coda Callbacks
• Callback promise: server’s commitment to
notify client when file changes
• Callback break: notice from server that the
client’s file is stale; called a “break”
because it terminates the agreement.
There will be no further callbacks unless
the client renews it.

Figure 11-23, page 523
• Local copies of files can be used as long
as the client still has an outstanding
callback promise
– No other client has closed a modified file.

client 1 cache

server

client 2 cache

Suppose clients 1 & 2 have cached the same file.
Client 1 modifies the file
How/when does client2 know?
What role, if any, does the server have?
Are Coda and NFS different in this respect?

11.6.2: Server-Side Replication
• Caching: replication at the client side.
– Initiated implicitly by client request
– Cached data is temporary
– Unit of caching = a file, or less (usually)
– Purpose: improved performance
• Server replication
– Mainly for fault tolerance & availability
– May actually degrade performance (overhead)
– Replicated data is permanent

Caching & Replication in Coda
• Unit of replication = volume (group of
related files)
• Each volume is stored on several servers,
its Volume Storage Group (VSG)
• Available Volume Storage Group (AVSG)
is the set of servers a client can actually
reach
• Contact one server to get permission to R/
W, contact all when closing an updated
file.

Server
S1 Server
S3

Server Broken
S2 network
Client Client
A B
Open(f) Open(f)

Figure 11-24. Two clients with a different AVSG for the same file

Writing in Disconnected Systems
• Each file has a Coda version vector (CVV),
analogous to vector timestamps, one
component per server. Starts at (1, 1, 1)
• Update local component after a file is
updated.
• As long as all servers get all updates, all
timestamps will be equal

Detecting Inconsistencies
• In the previous example, both A and B will
be allowed to open a file for writing.
• When A closes, it will update S1 and S2, but
not S3; B will update S3, but not S1, S2.
• The timestamp at S1 and S2 will be [2, 2, 1].
• The timestamp at S3 will be [1, 1, 2].
• It is easy to detect the inconsistency, but
knowing how to resolve them is application
dependent.

Replication in P2P Systems
• In P2P systems replication is more
important because
– P2P members are less reliable – may leave
the system or remove files
– Load balance is important since there are no
designated servers
• File usage in P2P is different: most files
are read only, updates consist of adding
new files, so consistency is less of an
issue.

Unstructured P2P Systems
(each node knows n neighbors)
• Look-up = search (in structured systems,
lookup is directed by some algorithm)
• Replication speeds up the process
• How to allocate files to nodes (it may not
be possible to force a node to store files)
– Uniformly distribute n copies across network
– Allocate more replicas for popular files
– Users who download files are responsible for
sharing them with others (as in BitTorrent)

Structured P2P Systems
• Replication is used primarily for load
balance
• Possible approaches:
– Store a replica at each node in the search
path (concentrates replicas near the prime
copy, but may unbalance some nodes)
– Store replicas at nodes that request a file,
store pointers to it at nodes along the way.

11.7: Fault Tolerance in DFS*
• Review of Fault Tolerance
• Handling Byzantine Failures
• High Availability in P2P systems

Basic Concepts - Review
• Distributed systems may experience partial failure
• Build systems to automatically recover from
crashes.
• Continue to operate normally while failures are
being repaired; i.e., be fault tolerant.
• Fault tolerant systems exhibit dependabilty.
– Availability: the system is immediately ready to use
– Reliability: the system can run continuously without
failing.
• (remember availability/reliability example)
– Safety: system failure doesn’t have disastrous
consequences
– Maintainability: easy to repair

Failure Models
• Failure may be due to an error at any place in
the system:
– The server crashes
– The network goes down
– A disk crashes
– Security violations occur
• Crash failure, omission failure, Byzantine
failure:
– Incorrect, but undetectable;
– malicious servers produce deliberately wrong results,
– ...

Handling Byzantine Failures in
Distributed File Systems
• Replication handles many errors in DFS
but Byzantine errors are harder to solve.
• Text presents an algorithm by Castro and
Liskov that works as long as no more than
1/3 of the nodes is faulty at any moment.
• Clients must get the same answer from
k+1 servers (in a system with 3k +1) to be
sure the answer is correct.

Availability in P2P Systems
• Possible approaches
– Replication (although must be at very high
levels due to unreliability of nodes)
– Erasure coding: divides a file into m
fragments, recodes them into n > m fragments
such that any set of m fragments can be used
to reconstruct the entire file. Distribute
fragments, rather than entire file replicas
• Requires less redundancy than full replication.

A20 dfs2

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a A20 dfs2

Semelhante a A20 dfs2 (20)

Último

Último (20)

A20 dfs2