Notes about concurrent and distributed systems & x86 virtualization

Concurrent and Distributed Systems (Bechini)
MUTUAL EXCLUSION

Volatile: any write to a volatile variable establishes a happensbefore relationship with subsequent reads of
that same variable. This means that changes to a volatile variable are always visible to other threads. What's
more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the
volatile, but also the side effects of the code that led up the change.
Reads and writes are atomic for all variables declared volatile (including long and double variables).

Mutual Exclusion desired properties:
1. ME guaranteed in any case
2. A process out of the Critical Section MUST NOT prevent any other to access
3. No deadlock
4. No busy waiting
5. No starvation

Deadlock: a situation in which two or more competing actions are each waiting for the other to finish, and thus
neither ever does.
A deadlockers situation can arise if all of the following conditions hold simultaneously in a system:[1]
1. Mutual Exclusion: At least one resource must be held in a nonshareable mode.[1]
Only one process
can use the resource at any given instant of time.
2. Hold and Wait or Resource Holding: A process is currently holding at least one resource and
requesting additional resources which are being held by other processes.
3. No Preemption: a resource can be released only voluntarily by the process holding it.
4. Circular Wait: A process must be waiting for a resource which is being held by another process, which
in turn is waiting for the first process to release the resource. In general, there is a set of waiting
processes, P = {P1, P2, ..., PN}, such that P1 is waiting for a resource held by P2, P2 is waiting for a
resource held by P3 and so on until PN is waiting for a resource held by P1.[1][7]
These four conditions are known as the Coffman conditions from their first description in a 1971 article by
Edward G. Coffman, Jr.[7]
Unfulfillment of any of these conditions is enough to preclude a deadlock from
occurring.

Busy waiting: In software engineering, busywaiting or spinning is a technique in which a process
repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available. In
lowlevel programming, busywaits may actually be desirable. It may not be desirable or practical to implement
interruptdriven processing for every hardware device, particularly those that are seldom accessed.

Sleeping lock: as opposite to a spinning lock, this technique puts a thread waiting for accessing a resource in
sleeping/ready mode. So, a thread is paused and its execution stops. The CPU can perform a context switch
and can keep working on another process/thread. This saves some CPU time (cycles) that would be wasted by

a spinning lock implementation. Anyway, also sleeping locks have a time overhead that should be taken into
account when evaluating which solution to adopt.

Starvation: a process is perpetually denied necessary resources to proceeds its work.[1]
Starvation may be
caused by errors in a scheduling or mutual exclusion algorithm, but can also be caused by resource leaks, and
can be intentionally caused via a denialofservice attack such as a fork bomb.

Race Condition: (or race hazard) is the behavior of an electronic, software or other system where the output
is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when events do not
happen in the order the programmer intended. The term originates with the idea of two signals racing each
other to influence the output first. Race conditions arise in software when an application depends on the
sequence or timing of processes or threads for it to operate properly.

Dekker’s alg.: the first known correct solution to the mutual exclusion problem in concurrent programming.

Dekker's algorithm guarantees mutual exclusion, freedom from deadlock, and freedom from starvation.
One advantage of this algorithm is that it doesn't require special Testandset (atomic read/modify/write)
instructions and is therefore highly portable between languages and machine architectures. One disadvantage
is that it is limited to two processes and makes use of busy waiting instead of process suspension. (The use of
busy waiting suggests that processes should spend a minimum of time inside the critical section.)
This algorithm won't work on SMP machines equipped with these CPUs without the use of memory barriers.

Peterson’s alg.: a concurrent programming algorithm for mutual exclusion that allows two processes to share
a singleuse resource without conflict, using only shared memory for communication.

The algorithm uses two variables, flag and turn. A flag[n] value of true indicates that the process n wants to
enter the critical section. Entrance to the critical section is granted for process P0 if P1 does not want to enter
its critical section or if P1 has given priority to P0 by setting turn to 0.

The algorithm does satisfy the three essential criteria to solve the critical section problem, provided that
changes to the variables turn, flag[0], and flag[1] propagate immediately and atomically. The while condition
works even with preemption.[1]

Filter alg.: generalization to N>2 of Peterson’s alg

HW support to ME: an operation (or set of operations) is atomic, linearizable, indivisible or uninterruptible
if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from
concurrent processes. Additionally, atomic operations commonly have a succeedorfail definition — they
either successfully change the state of the system, or have no apparent effect.
● Test_and_Set: reads a variable’s value, copies it to store the old value, modifies it with the new value
and writes the old value returning it. Maurice Herlihy (1991) proved that testandset has a finite
consensus number, in contrast to the compareandswap operation. The testandset operation can
solve the waitfree consensus problem for no more than two concurrent processes.[1]
However, more

than two decades before Herlihy's proof, IBM had already replaced Testandset by
Compareandswap, which is a more general solution to this problem. It may suffer of starvation. (from
notes: in a multiprocessor system it is not enough alone to guarantee ME, it is such that interrupts must
be disabled).

● Compare_and_Swap (CAS): an atomic instruction used in multithreading to achieve synchronization.
It compares the contents of a memory location to a given value and, only if they are the same, modifies
the contents of that memory location to a given new value. This is done as a single atomic operation. It
might suffer starvation. On servergrade multiprocessor architectures of the 2010s, compareandswap
is relatively cheap relative to a simple load that is not served from cache. A 2013 paper points out that a
CAS is only 1.15 times more expensive that a noncached load on Intel Xeon (WestmereEX) and 1.35
times on AMD Opteron (MagnyCours).[6]
As of 2013, most multiprocessor architectures support CAS in
hardware. As of 2013, the compareandswap operation is the most popular synchronization primitive
for implementing both lockbased and nonblocking concurrent data structures.[4]

● Fetch_and_Add: a special instruction that atomically modifies the contents of a memory location. It
fetches (copies) the old value of the parameter, adds to the parameter the value given by the second
parameter and returns the old value. Maurice Herlihy (1991) proved that fetchandadd has a finite
consensus number, in contrast to the compareandswap operation. The fetchandadd operation can
solve the waitfree consensus problem for no more than two concurrent processes.[1]

Guidelines:
● do not use synchronized blocks if not strictly necessary
● prefer atomic variables already implemented
● when there is more than one variable to share, wrap critical sections with lock/unlock
● synchronized only when there is a write operation

In Java exist three main mechanisms for ME: atomic variables, implicit locks and explicit locks (reference).

Condition Variables: a threads’ synchronization mechanism. A thread can suspend its execution while waiting
for a specific condition to happen. CVs do not guarantee ME on shared resources, so they have to be used in
conjunction with mutexes.
A thread can perform on a CV two operations:
● wait: ConditionVariable c; c.wait(); = a thread is suspended while waiting for a signal to resume it
● signal: c. signal(); = a thread exiting the CS signals to a waiting thread in the waiting queue to resume
its execution. The awaken thread must check again that the condition it was waiting for has actually
happened
● signalAll: c.signalAll(); = wakes up all the waiting threads.
These three methods must be invoked inside a critical section.

Java Monitor: in java, every object (Object) has three methods, namely wait, notify and notifyAll, performing
the same as the three previous ones. So, it is possible to implement a mechanism completely similar to
condition variables by using these three methods. By dealing with Objects, in order to guarantee ME it is
necessary to use the synchronized keyword.
Java ReentrantLock (oracle reference): A reentrant mutual exclusion Lock with the same basic behavior and
semantics as the implicit monitor lock accessed using synchronized methods and statements, but with
extended capabilities. A ReentrantLock is owned by the thread last successfully locking, but not yet unlocking
it. A thread invoking lock will return, successfully acquiring the lock, when the lock is not owned by another
thread. The method will return immediately if the current thread already owns the lock. The constructor for this
class accepts an optional fairness parameter. When set true, under contention, locks favor granting access to
the longestwaiting thread. Otherwise this lock does not guarantee any particular access order. Programs
using fair locks accessed by many threads may display lower overall throughput (i.e., are slower; often much
slower) than those using the default setting, but have smaller variances in times to obtain locks and guarantee
lack of starvation.
It is recommended practice to always immediately follow a call to lock with a try block:

Since a CV must always be used together with a mutex, the ReentrantLock gives the method newCondition()
to obtain a CV, so that when performing a wait on this CV the lock used to create the same condition will be
released and the thread suspended.
The followings are the most relevant methods of ReentrantLock:
● lock()
● tryLock(): Acquires the lock only if it is not held by another thread at the time of invocation.
● unlock()
● newCondition()
There is also a method that permit to specify a timeout to observe while trying to lock: tryLock (long timeout,
TimeUnit unit)

Barging: usually locks are starvation free, because they use queues to put suspended threads that are FIFO,
so every single thread entering the queue will acquire the lock with a bounded time. There are such situation
when the JVM performs some performance optimization: since the suspend/resume mechanism requires

overhead and waste of time, when there is contention for the same lock the JVM might wake up a thread
already executing instead of a sleeping one. This process is called barging.

Monitor (oracle reference): mutex + condition variable; a synchronization construct that allows threads to have
both mutual exclusion and the ability to wait (block) for a certain condition to become true. Only one process at
a time is able to enter the monitor. In other words, a monitor is a mechanism that associate to a given data
structure a set of procedures/operations that are the only ones allowed. Each procedure/operation is mutually
exclusive: only one process/thread can access the monitor at a time.
Two variants of monitor exist:
● Hoare’s m.: blocking condition variable; signalandwait: There are 3 kind of queues: enter (threads
aiming to enter the monitor), urgent (threads who left the monitor lock to those ones that were already
waiting on the condition variable) and condition (a queue for each CV). When a thread executes a
signal, it autosuspends, moves to the urgent queue and wakes up the first thread waiting in the waiting
queue of that variable. Threads waiting in the urgent queue will be awaken before the ones in the enter
queue.
● MESA: nonblocking CV; signalandcontinue; a signal does not suspend the invoking thread, but
makes a thread waiting on the waiting queue of that CV move to the entering queue. The signalling
thread leaves the monitor continuing with its execution.
Semaphore: another mechanism for ME; in practice, it is a shared integer variable p>=0 on which increments
and decrements are performed by means of two atomic operations:
● V (verhoog = increment, means signal): used when a thread is exiting a portion of code.
● P (prolaag = try, means wait): used when a thread wants to enter a CS.
In practice, a semaphore allow n threads to “enter” a specific portion of code, just by setting up the initial value
of p to n.

Readers and Writers problem: n threads want to read and/or write from/to a shared variable, namely a buffer.
ME is required since writes to the same memory must be synchronized.
In Java exist the ReentrantReadWriteLock class: maintains a pair of associated locks, one for readonly
operations and one for writing; also keeps the capabilities of a ReentrantLock; in particular, a lock downgrading
is possible: a thread holding the write lock might obtain a reentrant lock on the read lock.
Another solution to ME for the readers and writers problem is the synchronization of all the methods in a class
containing a shared collection, so that the class actually wraps the shared variable and has ME guaranteed.
Concurrent collections (oracle reference): classes that contains threadsafe data structures. They can be:
● blocking: read and write operations (take and put) wait for the data structure to be nonempty or
nonfull.
● nonblocking
CopyOnWriteArrayList (oracle reference): A threadsafe variant of ArrayList in which all mutative operations
(add, set, and so on) are implemented by making a fresh copy of the underlying array. Ordinarily too costly, but
may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful
when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent
threads.
Dequeue: “double end queue”, a queue where an element can be put either from the head or the tail of the
queue.

Work stealing: suppose there are m producers and n consumers; each producer writes to a shared buffer and
each consumer has its own queue where the work flows from the buffer (according to a specific policy for
example). It might happen that a consumer is observing its queue growing because it is not able to consume
all the work. In this situation, any other consumer that might observe an empty queue, might “steal” work from
the queue of that other consumer.
JVM Memory: is organized in three main areas:
● method: for each class, methods’ code, attributes and field values are stored
● heap: all the instances are stored here
● thread stack: for each thread, a data structure is placed here, containing methods’ stack, PC register,
etc...

Tasks and Executors: the way to manage threads execution, in terms of starting, stopping and resuming a
thread

Executor: An object that executes submitted Runnable tasks. This interface provides a way of decoupling task
submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc.
An Executor is normally used instead of explicitly creating threads.
void execute(Runnable command)

Runnable: implemented by any class whose instances are intended to be executed by a thread. The class
must define a method of no arguments called run.

Callable: A task that returns a result and may throw an exception. Implementors define a single method with
no arguments called call. The Callable interface is similar to Runnable, in that both are designed for classes

whose instances are potentially executed by another thread. A Runnable, however, does not return a result
and cannot throw a checked exception.
Future: represents the result of an asynchronous computation. Methods are provided to check if the
computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can
only be retrieved using method get when the computation has completed, blocking if necessary until it is ready.
Cancellation is performed by the cancel method. Additional methods are provided to determine if the task
completed normally or was cancelled. Once a computation has completed, the computation cannot be
cancelled. If you would like to use a Future for the sake of cancellability but not provide a usable result, you
can declare types of the form Future<?> and return null as a result of the underlying task.
boolean cancel(boolean mayInterruptIfRunning)
boolean isCancelled()
boolean isDone()
V get() throws InterruptedException, ExecutionException
V get(long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, TimeoutException

ExecutorService: An Executor that provides methods to manage termination and methods that can produce a
Future for tracking progress of one or more asynchronous tasks. An ExecutorService can be shut down, which
will cause it to reject new tasks. Two different methods are provided for shutting down an ExecutorService. The
shutdown() method will allow previously submitted tasks to execute before terminating, while the
shutdownNow() method prevents waiting tasks from starting and attempts to stop currently executing tasks.
Upon termination, an executor has no tasks actively executing, no tasks awaiting execution, and no new tasks
can be submitted.
void shutdown()
List<Runnable> shutdownNow()
<T> Future<T> submit(Callable<T> task)
Future<?> submit(Runnable task)

ThreadPoolExecutor: An ExecutorService that executes each submitted task using one of possibly several
pooled threads, normally configured using Executors factory methods.
Thread pools address two different problems: they usually provide improved performance when executing
large numbers of asynchronous tasks, due to reduced pertask invocation overhead, and they provide a means
of bounding and managing the resources, including threads, consumed when executing a collection of tasks.
Each ThreadPoolExecutor also maintains some basic statistics, such as the number of completed tasks.

ScheduledThreadPoolExecutor: A ThreadPoolExecutor that can additionally schedule commands to run
after a given delay, or to execute periodically.
public ScheduledThreadPoolExecutor(int corePoolSize)
public ScheduledFuture<?> schedule(Runnable command, long delay, TimeUnit
   unit)
public <V> ScheduledFuture<V> schedule(Callable<V> callable, long delay,
  TimeUnit unit)
public ScheduledFuture<?> scheduleAtFixedRate(Runnable command,

long initialDelay, long period, TimeUnit unit)
public ScheduledFuture<?> scheduleWithFixedDelay(Runnable command,
long initialDelay, long delay, TimeUnit unit)
public void execute(Runnable command)
public Future<?> submit(Runnable task)
public <T> Future<T> submit(Callable<T> task)
public void shutdown()
public List<Runnable> shutdownNow()

Task lifecycle:

Executor lifecycle:

Number of threads in the pool:
N )Npool = CPU ∙ U ∙ ( T COMPUTE
T + T WAIT COMPUTE
+ 1

where U is the CPUs and +1 is for safety reasons: it might happen that all the necessary threads block and no
more threads then could enter the pool.

Deadlock with nested monitors: suppose two nested monitors, where a thread first waits on a CV and then
enters its monitor and waits on a nested CV. Now, if another thread, a second one, wants to enter the first
monitor, it might happen that: the first thread signalled on the outer CV and signal got lost, or the inner thread
still has to signal on the outer CV. In any case, the thread waiting outside will never be able to enter the
monitor because the inner thread will always be blocked on the internal CV that will never receive any signal.
So, this situation leads to a deadlock.

Deadlock with executor and monitor (Thread Starving Deadlock): suppose to have an Executor with a
thread pool whose size is N. Suppose that all the threads encounter the same CV and they all wait on it. If
there is no other thread available in the pool, then no one will be able to wake up the waiting threads, thus
resulting in a deadlock.

Waitforgraph: a directed graph used for deadlock detection in operating systems and relational database
systems.

See Coffman.

Memory barrier: a type of barrier instruction that causes a central processing unit (CPU) or compiler to
enforce an ordering constraint on memory operations issued before and after the barrier instruction; necessary
because most modern CPUs employ performance optimizations that can result in outoforder execution.

Java Memory Model (See Java Memory Model paper):
● Program order rule: actions in a thread are performed in their coding order.
● Monitor lock rule: an unlock to a monitor happensbefore any subsequent lock on the same.
● Volatile variable rule: a write to a v. variable happensbefore any subsequent read to the same var.;
the same holds for atomic var.s.
● Thread start rule: a call to Thread.start happensbefore any action in the started thread.
● Thread termination rule: any action in a thread happensbefore any other thread detects that it has
terminated, either by a successfully return from Thread.join or Thread.isAlive returning false.
● Interruption rule: a call to interrupt for a thread happensbefore the interrupted thread detects the
interrupt.
● Finalizer rule: the end of a constructor for an object happensbefore the start of its finalizer.
● Transitivity: if A happensbefore B, B happensbefore C, then A happensbefore C.
Performance
With Java, in order to perform accurate performance measurements it is important to exactly know how the
code is run by the JVM. An important component is the Just In Time (JIT) compilation engine: this module
performs dynamic compilation during the execution of a program – at run time – rather than prior to
execution.[1]
Most often this consists of translation to machine code, which is then executed directly, but can
also refer to translation to another format. It allows adaptive optimization such as dynamic recompilation – thus
in theory JIT compilation can yield faster execution than static compilation. Interpretation and JIT compilation

are particularly suited for dynamic programming languages, as the runtime system can handle latebound data
types and enforce security guarantees.
Evaluating a java source code performance requires collecting statistics about its execution times. In particular,
when interested in evaluating a multithread software, it is important to have such a way to synchronize the
threads starting execution time. In other words, all the threads have to start at the same time so that the
measurements are faithful to reality. Java supplies a specific class for this task.
CountDownLatch (oracle reference): a synchronization aid that allows one or more threads to wait until a set
of operations being performed in other threads completes.
A CountDownLatch is initialized with a given count. The await methods block until the current count reaches
zero due to invocations of the countDown() method, after which all waiting threads are released and any
subsequent invocations of await return immediately. This is a oneshot phenomenon the count cannot be
reset. If you need a version that resets the count, consider using a CyclicBarrier.
A CountDownLatch is a versatile synchronization tool and can be used for a number of purposes. A
CountDownLatch initialized with a count of one serves as a simple on/off latch, or gate: all threads invoking
await wait at the gate until it is opened by a thread invoking countDown(). A CountDownLatch initialized to N
can be used to make one thread wait until N threads have completed some action, or some action has been
completed N times.
public CountDownLatch(int count)
public void await() throws InterruptedException
public void countDown()

CyclicBarrier (oracle reference): a synchronization aid that allows a set of threads to all wait for each other to
reach a common barrier point. CyclicBarriers are useful in programs involving a fixed sized party of threads
that must occasionally wait for each other. The barrier is called cyclic because it can be reused after the
waiting threads are released.
public CyclicBarrier(int parties, Runnable barrierAction)
public int await() throws InterruptedException, BrokenBarrierException

Performance in synchronization
Synchronization impacts performance because of:
● context switches
● memory synchronization
● thread synchronization
Considering our code, one of its performance indexes is throughput, in terms of number threads entering and
leaving it in a given time. So, here comes to us Little’s Law: where, in our case, L represents the λ w L =
number of threads in our system, i.e. the number of threads waiting to execute our portion of code (waiting
because of synchronization of course), λ is the arrival rate and w the delay of the system. What we want to do
is minimize L.
Possible solutions:
● CS shrinking: reduce the size of the CS so that a thread do not spend too much time in it.
● CS splitting: split a CS in smaller ones, so perform lock splitting.
● JVM optimizations:

○ Lock coarsening: whenever the JVM observes that a thread moves from a waiting queue to
another one (because of a locks chain) always in the same sequential order, then it collapse all
the CSs into a single one, thus merging the locks into a single one. This leads to performance
improvements in terms of waiting time, so improving the throughput.
○ Lock elision: if a CS is always executed by the same thread, then it is not necessary to protect
it with a lock: so the JVM removes it.
● Lock granularity: when the shared data structure is wide, it is important to put locks only where
needed, and not on the entire data structure. For example, an hashMap could be so wide that threads
concurrently would access it on different parts, thus not violating any ME constraint. So, locking the
entire table would be penalizing the performance of our software. A solution is to have more locks to
distribute on the overall data structure. Suppose: , then number of Locks and N table size NL = =
. NL should be dimensioned so that the conflict probability is minimized.ocks distribution N % N L = L
● Nonblocking algorithms: if failure or suspension of any thread cannot cause failure or suspension of
another thread[1]
(use of volatile and atomic variables).
○ Waitfree: the strongest nonblocking guarantee of progress, combining guaranteed
systemwide throughput with starvationfreedom. An algorithm is waitfree if every operation has
a bound on the number of steps the algorithm will take before the operation completes.[11]

○ Lockfree: allows individual threads to starve but guarantees systemwide throughput. An
algorithm is lockfree if it satisfies that when the program threads are run sufficiently long at
least one of the threads makes progress (for some sensible definition of progress). All waitfree
algorithms are lockfree. An algorithm is lock free if every operation has a bound on the number
of steps before one of the threads operating on a data structure completes its operation.[11]

○ Obstructionfree: the weakest natural nonblocking progress guarantee. An algorithm is
obstructionfree if at any point, a single thread executed in isolation (i.e., with all obstructing
threads suspended) for a bounded number of steps will complete its operation.[11]
All lockfree
algorithms are obstructionfree.
See Treiber's stack and Michael and Scott's queue. These two algorithms are a nonblocking implementation
of a stack and a queue respectively. Both are based on the use of AtomicReference variables, that make it
possible to deal with threads synchronization. They result with better performance when used in such
programs that suffer from high contention rates. Basically, they implement a spinning solution to deal with
concurrent modification on the stack pointer and the head and tail references in the queue. Thus, whenever a
thread tries to modify one of them has to be sure (see the if statements) that it is actually modifying what it is
expecting to modify (note the use of compareAndSet). If such a check fails, then go back (spin) and try again
from the beginning.

JAVA: Nested Classes (oracle reference)
A nested class is a member of its enclosing class. Nonstatic nested classes (inner classes) have access to
other members of the enclosing class, even if they are declared private. Static nested classes do not have
access to other members of the enclosing class. As a member of the OuterClass, a nested class can be
declared private, public, protected, or package private. (Recall that outer classes can only be declared public
or package private.)

Static nested class: A static nested class interacts with the instance members of its outer class (and other
classes) just like any other toplevel class. In effect, a static nested class is behaviorally a toplevel class that
has been nested in another toplevel class for packaging convenience.

Inner class: is associated with an instance of its enclosing class and has direct access to that object's
methods and fields. Also, because an inner class is associated with an instance, it cannot define any static
members itself. Objects that are instances of an inner class exist within an instance of the outer class.

To instantiate an inner class, you must first instantiate the outer class. Then, create the inner object within the
outer object with this syntax:
OuterClass.InnerClass innerObject = outerObject.new InnerClass();

Local classes (oracle reference): are classes that are defined in a block, which is a group of zero or more
statements between balanced braces. You typically find local classes defined in the body of a method. A local
class has access to the members of its enclosing class and to local variables. However, a local class can only
access local variables that are declared final. Starting in Java SE 8, a local class can access local variables
and parameters of the enclosing block that are final or effectively final (i.e. never changed in the enclosing
block).
Shadowing: If a declaration of a type (such as a member variable or a parameter name) in a particular scope
(such as an inner class or a method definition) has the same name as another declaration in the enclosing
scope, then the declaration shadows the declaration of the enclosing scope.
Anonymous classes: enable you to declare and instantiate a class at the same time. They are like local
classes except that they do not have a name. Use them if you need to use a local class only once. Ex.: when
making a new of an interface and then opening { to write “on the fly” the class.
AtomicInteger (oracle reference): An int value that may be updated atomically. Methods: getAndAdd(int
delta), getAndIncrement(), addAndGet(int delta), incrementAndGet().
ThreadLocal<T> (oracle reference): This class provides threadlocal variables. These variables differ from
their normal counterparts in that each thread that accesses one (via its get or set method) has its own,
independently initialized copy of the variable. ThreadLocal instances are typically private static fields in classes
that wish to associate state with a thread (e.g., a user ID or Transaction ID).

MESSAGE PASSING MODEL
Message passing between a pair of processes can be supported by two message communication operations,
send and receive, defined in terms of destinations and messages. A queue is associated with each message
destination. Sending processes cause messages to be added to remote queues and receiving processes
remove messages from local queues. Sending and receiving processes may be either synchronous or
asynchronous. In the synchronous form of communication, the sending and receiving processes synchronize
at every message. In this case, both send and receive are blocking operations.
In the asynchronous form of communication, the use of the send operation is nonblocking in that the sending
process is allowed to proceed as soon as the message has been copied to a local buffer, and the transmission
of the message proceeds in parallel with the sending process. The receive operation can have blocking and

nonblocking variants. In the nonblocking variant, the receiving process proceeds with its program after
issuing a receive operation, which provides a buffer to be filled in the background, but it must separately
receive notification that its buffer has been filled, by polling or interrupt.
Nonblocking communication appears to be more efficient, but it involves extra complexity in the receiving
process associated with the need to acquire the incoming message out of its flow of control. For these
reasons, today’s systems do not generally provide the nonblocking form of receive.
Communication channel hypothesis:
● Reliability: in terms of validity and integrity. As far as the validity property is concerned, a pointtopoint
message service can be described as reliable if messages are guaranteed to be delivered despite a
‘reasonable’ number of packets being dropped or lost. In contrast, a pointtopoint message service can
be described as unreliable if messages are not guaranteed to be delivered in the face of even a single
packet dropped or lost. For integrity, messages must arrive uncorrupted and without duplication.
● Ordering: Some applications require that messages be delivered in sender order – that is, the order in
which they were transmitted by the sender. The delivery of messages out of sender order is regarded
as a failure by such applications.
● QoS
● Queue policy

Processes addressing: in the Internet protocols, messages are sent to (Internet address, local port) pairs. A
local port is a message destination within a computer, specified as an integer. A port has exactly one receiver
(multicast ports are an exception) but can have many senders. An alternative to address processes is to use
the values (Process ID, local port).

Distributed systems models:
● Physical: focus on hardware organization.
● Architectural (behavioural): how the different components interact (clientserver, peertopeer).
● Fundamental (abstract): an high level abstraction that makes possible to represent mathematically the
real model, so to perform hypothesis validation

Boundedbuffer with asynchronous message passing and Dijkstra’s guarded commands: the most
important element of the guarded command language. In a guarded command, just as the name says, the
command is "guarded". The guard is a proposition, which must be true before the statement is executed. At the
start of that statement's execution, one may assume the guard to be true. Also, if the guard is false, the
statement will not be executed. The use of guarded commands makes it easier to prove the program meets the
specification. The statement is often another guarded command.
A guard can be in one of these three states:
● failed: condition is false
● valid: condition true, result received
● delayed: condition true, but no results available yet
See notes from Professor about the algorithm for the boundedbuffer (readerswriters) problem solved with
guarded commands.

Message Passing Interface (MPI, a reference): the first standardized, vendor independent, message passing
library. The advantages of developing message passing software using MPI closely match the design goals of
portability, efficiency, and flexibility. MPI is not an IEEE or ISO standard, but has in fact, become the "industry
standard" for writing message passing programs on HPC platforms.
MPI primarily addresses the messagepassing parallel programming model: data is moved from the address
space of one process to that of another process through cooperative operations on each process.
MPI main components and features:
● Communicators and Groups: MPI uses objects called communicators and groups to define which
collection of processes may communicate with each other. MPI_COMM_WORLD is the predefined
communicator that includes all of your MPI processes.
○ A group is an ordered set of processes. Each process in a group is associated with a unique
integer rank. Rank values start at zero and go to N1, where N is the number of processes in the
group. In MPI, a group is represented within system memory as an object. It is accessible to the
programmer only by a "handle". A group is always associated with a communicator object.
○ A communicator encompasses a group of processes that may communicate with each other.
All MPI messages must specify a communicator. In the simplest sense, the communicator is an
extra "tag" that must be included with MPI calls.
○ From the programmer's perspective, a group and a communicator are one.
● Rank: Within a communicator, every process has its own unique, integer identifier assigned by the
system when the process initializes. A rank is sometimes also called a "task ID". Ranks are contiguous
and begin at zero. It is used by the programmer to specify the source and destination of messages.
Often used conditionally by the application to control program execution (if rank=0 do this / if rank=1 do
that).
● MPI buffer: since send and receive are rarely perfectly synchronized, the MPI architecture presents a
library buffer that is used to store transiting messages while the receiver cannot receive them.
● Blocking and Nonblocking op.s: Blocking is interpreted as ‘blocked until it is safe to return’, in the
sense that application data has been copied into the MPI system and hence is in transit or delivered
and therefore the application buffer can be reused. Safe means that modifications will not affect the
data intended for the receive task. Safe does not imply that the data was actually received it may very
well be sitting in a system buffer. Nonblocking send and receive routines behave similarly they will
return almost immediately. They do not wait for any communication events to complete, such as
message copying from user memory to system buffer space or the actual arrival of message.
● PointtoPoint Communication: message passing between two, and only two, different MPI tasks.
One task is performing a send operation and the other task is performing a matching receive operation.
● Collective Communication: involve all processes within the scope of a communicator. It is the
programmer's responsibility to ensure that all processes within a communicator participate in any
collective operations. Unexpected behavior, including program failure, can occur if even one task in the
communicator doesn't participate.
● Types of Collective Operations:

● Synchronization processes wait until all members of the group have reached the
synchronization point.
● Data Movement broadcast, scatter/gather, all to all.
● Collective Computation (reductions) one member of the group collects data from the other
members and performs an operation (min, max, add, multiply, etc.) on that data.

● Order:
○ MPI guarantees that messages will not overtake each other.
○ Order rules do not apply if there are multiple threads participating in the communication
operations.
● Fairness: MPI does not guarantee fairness it's up to the programmer to prevent "operation
starvation".
● Envelope: source+destination+tag+communicator (see later)

The underlying architectural model for MPI is relatively simple and captured in Figure 4.17; note the added
dimension of explicitly having MPI library buffers in both the sender and the receiver, managed by the MPI
library and used to hold data in transit.

MPI pointtopoint routines:

Buffer = Program (application) address space that references the data that is to be sent or received.
Count = Indicates the number of data elements of a particular type to be sent.
Type = For reasons of portability, MPI predefines its elementary data types.
Dest = The rank of the receiving process.
Source = The rank of the sending process. This may be set to the wild card MPI_ANY_SOURCE to receive a
message from any task.
Tag = Arbitrary nonnegative integer assigned by the programmer to uniquely identify a message. Send and
receive operations should match message tags. For a receive operation, the wild card MPI_ANY_TAG can be
used to receive any message regardless of its tag.
Communicator = communication context, or set of processes for which the source or destination fields are
valid.
Status = For a receive operation, indicates the source of the message and the tag of the message.
Request = Used by nonblocking send and receive operations. The system issues a unique “request number”.

TIME AND GLOBAL STATES
Clocks, events and process states
We define an event to be the occurrence of a single action that a process carries out as it executes – a
communication action or a statetransforming action. The sequence of events within a single process pi can be
placed in a single, total ordering, which we denote by the relation ➝i between the events. That is, e ➝i e' if and
only if the event e occurs before e’ at pi. This ordering is well defined, whether or not the process is
multithreaded, since we have assumed that the process executes on a single processor. Now we can define
the history of process pi to be the series of events that take place within it, ordered as we have described by
the relation ➝i:
history(pi) hi = < ei
0
ei1
e2
… >

The operating system reads the node’s hardware clock value, Hi(t) , scales it and adds an offset so as to
produce a software clock Ci(t) = α Hi(t) + β that approximately measures real, physical time t for process pi . In
other words, when the real time in an absolute frame of reference is t, Ci(t) is the reading on the software clock.
Successive events will correspond to different timestamps only if the clock resolution – the period between
updates of the clock value – is smaller than the time interval between successive events.

SKEW between computer clocks in a distributed system Network: The instantaneous difference between the
readings of any two clocks is called their skew.

Clock DRIFT: which means that they count time at different rates, and so diverge.

Clock's DRIFT RATE: change in the offset between the clock and a nominal perfect reference clock per unit of
time measured by the reference clock.

UTC (Coordinated Universal Time): is based on atomic time, but a socalled ‘leap second’ is inserted – or,
more rarely, deleted – occasionally to keep it in step with astronomical time. UTC signals are synchronized and
broadcast regularly from landbased radio stations and satellites covering many parts of the world.

Synchronizing Physical Clocks
Synchronization in a synchronous system: In general, for a synchronous system, the optimum bound that
can be achieved on clock skew when synchronizing N clocks is u * (1 – 1 / N) [Lundelius and Lynch 1984], u =
max min, the max and min time that a transmission of a message can observe in a synchronous system.

Cristian’s method for synchronizing clocks: use of a time server, connected to a device that receives
signals from a source of UTC, to synchronize computers externally. There is no upper bound on message
transmission delays in an asynchronous system, the roundtrip times for messages exchanged between pairs
of processes are often reasonably short – a small fraction of a second. He describes the algorithm as
probabilistic: the method achieves synchronization only if the observed roundtrip times between client and
server are sufficiently short compared with the required accuracy. A simple estimate of the time to which p

should set its clock is t + Tround / 2 , which assumes that the elapsed time is split equally before and after S
placed t in mt(=message for timestamp sinc.). This is normally a reasonably accurate assumption, unless the
two messages are transmitted over different networks.

Discussion: Cristian’s method suffers from the problem associated with all services implemented by a single
server: that the single time server might fail and thus render synchronization temporarily impossible. Cristian
suggested, for this reason, that time should be provided by a group of synchronized time servers, each with a
receiver for UTC time signals. Dolev et al. [1986] showed that if f is the number of faulty clocks out of a total of
N, then we must have N = 3f if the other, correct, clocks are still to be able to achieve agreement.

Berkeley's Algorithm: an algorithm for internal synchronization developed for collections of computers
running Berkeley UNIX. A coordinator computer is chosen to act as the master. Unlike in Cristian’s protocol,
this computer periodically polls the other computers whose clocks are to be synchronized, called slaves. The
slaves send back their clock values to it. The master estimates their local clock times by observing the
roundtrip times (similarly to Cristian’s technique), and it averages the values obtained (including its own
clock’s reading). The balance of probabilities is that this average cancels out the individual clocks’ tendencies
to run fast or slow. The accuracy of the protocol depends upon a nominal maximum roundtrip time between
the master and the slaves.
The master takes a FAULTTOLERANT AVERAGE. That is, a subset is chosen of clocks that do not differ
from one another by more than a specified amount, and the average is taken of readings from only these
clocks.

NTP: Cristian’s method and the Berkeley algorithm are intended primarily for use within
intranets. NTP’s chief design aims and features are as follows:
● To provide a service enabling clients across the Internet to be synchronized accurately to UTC:
Although large and variable message delays are encountered in Internet communication, NTP employs
statistical techniques for the filtering of timing data and it discriminates between the quality of timing
data from different servers.
● To provide a reliable service that can survive lengthy losses of connectivity: There are redundant
servers and redundant paths between the servers. The servers can reconfigure so as to continue to
provide the service if one of them becomes unreachable.
● To enable clients to resynchronize sufficiently frequently to offset the rates of drift found in most
computers: The service is designed to scale to large numbers of clients and servers.
● To provide protection against interference with the time service, whether malicious or accidental: The
time service uses authentication techniques to check that timing data originate from the claimed trusted
sources. It also validates the return addresses of messages sent to it.

Logical Time and Logical Clocks
As Lamport [1978] pointed out, since we cannot synchronize clocks perfectly across a distributed system, we
cannot in general use physical time to find out the order of any arbitrary pair of events occurring within it. Two
simple and intuitively obvious points:

● If two events occurred at the same process pi (i = 1, 2,...N), then they occurred in the order in which pi
observes them – this is the order ➝i that we defined above.
● Whenever a message is sent between processes, the event of sending the message occurred before
the event of receiving the message.

Lamport called the partial ordering obtained by generalizing these two relationships the happenedbefore
relation. It is also sometimes known as the relation of causal ordering or potential causal ordering.
The sequence of events need not be unique.

For example, a ↛e and e ↛a, since they occur at different processes, and there is no chain of messages
intervening between them. We say that events such as a and e that are not ordered by ➝ are concurrent and
write this a || e .

Logical clocks: Lamport [1978] invented a simple mechanism by which the happenedbefore ordering can be
captured numerically, called a logical clock. A Lamport logical clock is a monotonically increasing software
counter, whose value need bear no particular relationship to any physical clock. Each process pi keeps its own
logical clock, Li , which it uses to apply socalled Lamport timestamps to events. We denote the timestamp of
event e at pi by Li(e) , and by L(e) we denote the timestamp of event e at whatever process it occurred at.
To capture the happenedbefore relation ➝, processes update their logical clocks and transmit the values of
their logical clocks in messages as follows:
● LC1: Li is incremented before each event is issued at process pi : Li := Li + 1.
● LC2:
(a) When a process pi sends a message m, it piggybacks on m the value t = Li.
(b) On receiving (m, t), a process pj computes Lj := max(Lj ,t) and then applies LC1 before timestamping
the event receive(m).

Note: e➝ e’ ⇒ L(e) < L(e’) .
The converse is not true. If L(e) < L(e’) , then we cannot infer that e ➝ e’.

Totally ordered logical clocks: Some pairs of distinct events, generated by different processes, have
numerically identical Lamport timestamps. We can create a total order on the set of events – that is, one for
which all pairs of distinct events are ordered – by taking into account the identifiers of the processes at which
events occur.
We define the global logical timestamps for these events to be (Ti, i) and (Tj, j) , respectively. (Ti, i) < (Tj, j) if
and only if either Ti < Tj , or Ti = Tj and i < j .

Vector clocks: Mattern [1989] and Fidge [1991] developed vector clocks to overcome the shortcoming of
Lamport’s clocks: the fact that from L(e) < L(e’) we cannot conclude that e ➝ e’.
A vector clock for a system of N processes is an array of N integers. Each process keeps its own vector clock,
Vi , which it uses to timestamp local events. There are simple rules for updating the clocks:

For a vector clock Vi, Vi(i) is the number of events that pi has timestamped, and Vi(j) (j ≠ i) is the number of
events that have occurred at pj that have potentially affected pi. (Process pj may have timestamped more
events by this point, but no information has flowed to pi about them in messages as yet.)

Figure 14.7 shows the vector timestamps of the events of Figure 14.5. It can be seen, for example, that V(a) <
V(f) , which reflects the fact that a➝ f. Similarly, we can tell when two events are concurrent by comparing their
timestamps. For example, that c || e can be seen from the facts that neither V(c) ≤ V(e) nor V(e) ≤ V(c).
Vector timestamps have the disadvantage, compared with Lamport timestamps, of taking up an amount of
storage and message payload that is proportional to N, the number of processes.

Global states: the problem of finding out whether a particular property is true of a distributed system as it
executes.
● Distributed garbage collection: An object is considered to be garbage if there are no longer any
references to it anywhere in the distributed system. To check that an object is garbage, we must verify
that there are no references to it anywhere in the system. When we consider properties of a system, we
must include the state of communication channels as well as the state of the processes.
● Distributed deadlock detection: A distributed deadlock occurs when each of a collection of processes
waits for another process to send it a message, and where there is a cycle in the graph of this
‘waitsfor’ relationship.
● Distributed termination detection: The phenomena of termination and deadlock are similar in some
ways, but they are different problems. First, a deadlock may affect only a subset of the processes in a
system, whereas all processes must have terminated. Second, process passivity is not the same as
waiting in a deadlock cycle: a deadlocked process is attempting to perform a further action, for which
another process waits; a passive process is not engaged in any activity.
● Distributed debugging: Distributed systems are complex to debug.

Global states and consistent cuts: The essential problem is the absence of global time.

A global state corresponds to initial prefixes of the individual process histories. A cut of the system’s execution
is a subset of its global history that is a union of prefixes of process histories:

The leftmost cut is inconsistent. This is because at p2 it includes the receipt of the message m1, but at p1 it
does not include the sending of that message. This is showing an ‘effect’ without a ‘cause’. The actual
execution never was in a global state corresponding to the process states at that frontier, and we can in
principle tell this by examining the ➝ relation between events. By contrast, the rightmost cut is consistent.

INTERPROCESS COMMUNICATION
Direct comm. (direct naming): unique names are given to all processes comprising a program
● symmetrical direct naming: both the sender and receiver name the corresponding process.

● asymmetrical direct naming: the receiver can receive messages from any process.
Indirect comm. (indirect naming): uses intermediaries called channels or mailboxes
● symmetrical indirect naming: both the sender and receiver name the corresponding channel.
● asymmetrical indirect naming: the receiver can receive messages from any channel.

Request/Reply protocol: a requestor sends a request message to a replier system which receives and
processes the request, ultimately returning a message in response. This is a simple, but powerful messaging
pattern which allows two applications to have a twoway conversation with one another over a channel. This
pattern is especially common in clientserver architectures.[1]
For simplicity, this pattern is typically implemented in a purely synchronous fashion, as in web service calls
over HTTP, which holds a connection open and waits until the response is delivered or the timeout period
expires. However, request–response may also be implemented asynchronously, with a response being
returned at some unknown later time. This is often referred to as "sync over async", or "sync/async", and is
common in enterprise application integration (EAI) implementations where slow aggregations, timeintensive
functions, or human workflow must be performed before a response can be constructed and delivered.

Marshalling and Unmarshalling: The information stored in running programs is represented as data
structures, whereas the information in messages consists of sequences of bytes. Irrespective of the form of
communication used, the data structures must be flattened (converted to a sequence of bytes) before

transmission and rebuilt on arrival. There are differences in data representation from a computer to another
one. So, when communicating, the following problems must be addressed:
● primitive data representation (such as integers and floatingpoint numbers).
● set of codes used to represent characters (ASCII or Unicode).

There are two ways for enabling any two computers to exchange binary data values:
● The values are converted to an agreed external format before transmission and converted to the local
form on receipt.
● The values are transmitted in the sender’s format, together with an indication of the format used, and
the recipient converts the values if necessary.
An agreed standard for the representation of data structures and primitive values is called an external data
representation.
Marshalling is the process of taking a collection of data items and assembling them into a form suitable for
transmission in a message. Unmarshalling is the process of disassembling them on arrival to produce an
equivalent collection of data items at the destination. Thus marshalling consists of the translation of structured
data items and primitive values into an external data representation. Similarly, unmarshalling consists of the
generation of primitive values from their external data representation and the rebuilding of the data structures.
Three alternative approaches to external data representation and marshalling:
● CORBA’s common data representation, which is concerned with an external representation for the
structured and primitive types that can be passed as the arguments and results of remote method
invocations in CORBA.
● Java’s object serialization, which is concerned with the flattening and external data representation of
any single object or tree of objects that may need to be transmitted in a message or stored on a disk.
● XML (Extensible Markup Language), which defines a textual format for representing structured data.
In the first two approaches, the primitive data types are marshalled into a binary form. In the third approach
(XML), the primitive data types are represented textually. The textual representation of a data value will
generally be longer than the equivalent binary representation. The HTTP protocol is another example of the
textual approach.
Two main issues exist in marshalling:
● compactness: the resulting message should be as compact as possible.
● data type inclusion: CORBA’s representation includes just the values of the objects transmitted; Java
serialization and XML does include type information.
Two other techniques for external data representation are worthy of mention:
● Google uses an approach called protocol buffers to capture representations of both stored and
transmitted data.
● JSON (JavaScript Object Notation)
Both these last two methods represent a step towards more lightweight approaches to data representation
(when compared, for example, to XML).
Particular attention is to be paid to remote object references. A remote object reference is an identifier for a
remote object that is valid throughout a distributed system. A remote object reference is passed in the
invocation message to specify which object is to be invoked. Remote object references must be generated in a
manner that ensures uniqueness over space and time. Also, object references must be unique among all of the
processes in the various computers in a distributed system. One way is to construct a remote object reference

by concatenating the Internet address of its host computer and the port number of the process that created it
with the time of its creation and a local object number. The local object number is incremented each time an
object is created in that process.

The last field of the remote object reference shown in Figure 4.13 contains some information about the
interface of the remote object, for example, the interface name. This information is relevant to any process that
receives a remote object reference as an argument or as the result of a remote invocation, because it needs to
know about the methods offered by the remote object.

Idempotent op.s: an operation that will produce the same results if executed once or multiple times.[7]
In the
case of methods or subroutine calls with side effects, for instance, it means that the modified state remains the
same after the first call. In functional programming, though, an idempotent function is one that has the property
f(f(x)) = f(x) for any value x.[8]

This is a very useful property in many situations, as it means that an operation can be repeated or retried as
often as necessary without causing unintended effects. With nonidempotent operations, the algorithm may
have to keep track of whether the operation was already performed or not.
In the HyperText Transfer Protocol (HTTP), idempotence and safety are the major attributes that separate
HTTP verbs. Of the major HTTP verbs, GET, PUT, and DELETE are idempotent (if implemented according to
the standard), but POST is not.[9]

Remote Procedure Call (RPC)
Requestreply protocols provide relatively lowlevel support for requesting the execution of a remote operation,
and also provide direct support for RPC and RMI.
RPC allows client programs to call procedures transparently in server programs running in separate processes
and generally in different computers from the client.
RPC has the goal of making the programming of distributed systems look similar, if not identical, to
conventional programming – that is, achieving a high level of distribution transparency. This unification is
achieved in a very simple manner, by extending the abstraction of a procedure call to distributed environments.
In particular, in RPC, procedures on remote machines can be called as if they are procedures in the local
address space. The underlying RPC system then hides important aspects of distribution, including the
encoding and decoding of parameters and results, the passing of messages and the preserving of the required
semantics for the procedure call.
Three issues that are important in understanding this concept:
● programming with interfaces (the style of programming): in order to control the possible interactions
between modules, an explicit interface is defined for each module; as long as its interface remains the
same, the implementation may be changed without affecting the users of the module.

○ service interface: the specification of the procedures offered by a server, defining the types of
the arguments of each of the procedures. But why an interface?
■ It is not possible for a client module running in one process to access the variables in a
module in another process.
■ The parameterpassing mechanisms used in local procedure calls are not suitable when
the caller and procedure are in different processes. In particular, call by reference is not
supported. Rather, the specification of a procedure in the interface of a module in a
distributed program describes the parameters as input or output, or sometimes both.
Input parameters are passed to the remote server by sending the values of the
arguments in the request message and output parameters are returned in the reply
message and are used as the result of the call.
■ Addresses in one process are not valid in another remote one.
○ Interface Definition Language (IDL): designed to allow procedures implemented in different
languages to invoke one another. An IDL provides a notation for defining interfaces in which
each of the parameters of an operation may be described as for input or output in addition to
having its type specified.
● the call semantics associated with RPC:
○ Retry request message: Controls whether to retransmit the request message until either a
reply is received or the server is assumed to have failed.
○ Duplicate filtering: Controls when retransmissions are used and whether to filter out duplicate
requests at the server.
○ Retransmission of results: Controls whether to keep a history of result messages to enable
lost results to be retransmitted without reexecuting the operations at the server.
Combinations of these choices lead to a variety of possible semantics for the reliability of remote invocations
as seen by the invoker. Note that for local procedure calls, the semantics are exactly once, meaning that every
procedure is executed exactly once (except in the case of process failure). The choices of RPC invocation
semantics are defined as follows.
○ Maybe semantics: the remote procedure call may be executed once or not at all. Maybe
semantics arises when no faulttolerance measures are applied and can suffer from the
following types of failure:
■ omission failures if the request or result message is lost;
■ crash failures when the server containing the remote operation fails.
Useful only for applications in which occasional failed calls are acceptable.
○ Atleastonce semantics: the invoker receives either a result, in which case the invoker knows
that the procedure was executed at least once, or an exception informing it that no result was
received. It can be achieved by the retransmission of request messages, which masks the
omission failures of the request or result message. Atleastonce semantics can suffer from the
following types of failure:
■ crash failures:when the server containing the remote procedure fails;
■ arbitrary failures: in cases when the request message is retransmitted, the remote server
may receive it and execute the procedure more than once, possibly causing wrong
values to be stored or returned.

If the operations in a server can be designed so that all of the procedures in their service
interfaces are idempotent operations, then atleastonce call semantics may be acceptable.
○ Atmostonce semantics: the caller receives either a result, in which case the caller knows that
the procedure was executed exactly once, or an exception informing it that no result was
received, in which case the procedure will have been executed either once or not at all. It can
be achieved by using all of the faulttolerance measures outlined in Figure 5.9.

● Transparency: RPC strives to offer at least location and access transparency, hiding the physical
location of the (potentially remote) procedure and also accessing local and remote procedures in the
same way. Middleware can also offer additional levels of transparency to RPC. However, remote
procedure calls suffer the followings:
○ vulnerability to failure due to the network and/or of the remote server process and no ability to
distinguish among them.
○ latency of a remote procedure call is several orders of magnitude greater than that of a local
one.
The current consensus is that remote calls should be made transparent in the sense that the syntax of
a remote call is the same as that of a local invocation, but that the difference between local and remote
calls should be expressed in their interfaces.

RPC Implementation: The software components required to implement RPC are shown in Figure 5.10. The
client that accesses a service includes one stub procedure for each procedure in the service interface. The
stub procedure behaves like a local procedure to the client, but instead of executing the call, it marshals the
procedure identifier and the arguments into a request message, which it sends via its communication module
to the server. When the reply message arrives, it unmarshals the results. The server process contains a
dispatcher together with one server stub procedure and one service procedure for each procedure in the
service interface. The dispatcher selects one of the server stub procedures according to the procedure
identifier in the request message. The server stub procedure then unmarshals the arguments in the request
message, calls the corresponding service procedure and marshals the return values for the reply message.
The service procedures implement the procedures in the service interface. The client and server stub

procedures and the dispatcher can be generated automatically by an interface compiler from the interface
definition of the service. RPC is generally implemented over a requestreply protocol like the ones discussed
so far. The contents of request and reply messages are the same as those illustrated for requestreply
protocols in Figure 5.4. RPC may be implemented to have one of the choices of invocation semantics
discussed: atleast once or atmostonce is generally chosen. To achieve this, the communication module will
implement the desired design choices in terms of retransmission of requests, dealing with duplicates and
retransmission of results, as shown in Figure 5.9.

Remote Method Invocation (RMI)
RMI allows objects in different processes to communicate with one another; it is an extension of local method
invocation that allows an object living in one process to invoke the methods of an object living in another
process.
The commonalities between RMI and RPC are as follows:
● They both support programming with interfaces.
● both typically constructed on top of requestreply protocols and can offer a range of call semantics such
as atleastonce and atmostonce.
● both offer a similar level of transparency – that is, local and remote calls employ the same syntax but
remote interfaces typically expose the distributed nature of the underlying call, for example by
supporting remote exceptions.

The following differences lead to added expressiveness:
● The programmer is able to use the full expressive power of objectoriented programming in the
development of distributed systems software.
● Building on the concept of object identity in objectoriented systems, all objects in an RMIbased
system have unique object references (whether they are local or remote), such object references can
also be passed as parameters, thus offering significantly richer parameterpassing semantics than in
RPC.

Notes about concurrent and distributed systems & x86 virtualization

Notes about concurrent and distributed systems & x86 virtualization

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (8)

Semelhante a Notes about concurrent and distributed systems & x86 virtualization

Semelhante a Notes about concurrent and distributed systems & x86 virtualization (20)

Último

Último (20)

Notes about concurrent and distributed systems & x86 virtualization