Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS
1. Shepherd: Node Monitors for Fault-Tolerant
Distributed Process Execution in OSIRIS
Nenad Stojnić
Databases & Information Systems Group
2. Outline
Self-organizing properties in OSIRIS and
current limitations
The Shepherd approach to fault-tolerance
Novel migration algorithm
Shepherd ring: herds, shepherd pools, routing
Binding ring: Service lookup, late binding, load
balancing
Summary
3. OSIRIS Open Service Infrastructure for Reliable and
Integrated process Support
Decentralized P2P execution of processes
Web Service Invocation
Fault-tolerant, Self-* properties
Late-binding & Load-balancing
Safe continuation-passing (2PC)
Pub/Sub Meta-data repositories
16. OSIRIS failure handling
Failure case Handling
Successor failure Late-binding
Migration failure 2PC abort
Predecessor failure No handling necessary
Temporary node failure Recovery from local
stable storage
Current node failure Process execution
stops/hangs
State is lost
No notification
17. Outline
Self-organizing properties in OSIRIS and
current limitations
The Shepherd approach to fault-tolerance
Novel migration algorithm
Shepherd ring: herds, shepherd pools, routing
Binding ring: Service lookup, late binding, load
balancing
Summary
18. Our solution: Shepherd
Shared Memory Layer
BA
OSIRIS Layer
Shepherd Layer
DC
3
E
D
1
B
A
2
D
B
4
D
A
5
2
EC
6
DC
7
Monitor
Read/Write
26. Shepherd Migration Algorithm
(K1
,B)
Wa c k
A
S1
B
S2
C
S3
<K0
,W0
> <K1
,W1
> <K2
,W2
>
K0
K1
,B
K1
K2
,C
K2
K2
<K3
,W3
>
11
33 55
44
66
77
(K2
,C)
Wa c k
(K3
,...)
Wa c k
22K0 K1 K1
K2
27. Shepherd failure cases
Failure of worker nodes
Failure of shepherds
Failures in the shared memory
28. Failure of worker nodes
Replacement node
from the herd
Same service type
Fail-safe services
BUT undo side
effects on Shared
Memory
Wa c k
S1
A''
S2
<K0
,W0
> <K1
,W1
>
K0
33 55
A'
K0
...
AX
29. Failure of shepherds
Shepherds organized
in pools, state shared
WN speaks to the pool
Transactional writes →
consistency guaranteed
New leader learns
current state from the
pool A
S1X S2
Wa c k
...
30. Failures in shared memory
Chord-based
Replicated
transactional storage
Successful writes
persistent
failed read/write can
be always retried
A
S1
<K0
,W0
> <K1
,W1
>
X
31. Shepherd ring
Used for:
Worker node to shepherd assignment
Routing of messages from WN to shepherds
Pools construction
Based on Chord structured overlay
Indentifier circle of Shepherd node IDs and Worker
node Ids (Consistent hashing)
Efficient routing: Log(NSh
)
35. Shepherd pools
Symmetric replication strategy:
Node ID congruence-modulo equivalence classes
Responsible for x “knows” entire class of x
Pool = all responsibles for a class
Transactional guarantees
Paxos consensus
39. Late binding
Locate a shepherd providing service type T
Shepherd provides type T if it monitors instances of
type T
Binding ring
Physical nodes & service types (resources)
Distributed “multimap” data structure
Service type → List of shepherds
present on-going work aimed at improv-
ing OSIRIS&apos; fault tolerance capabilities
processes can be imagined as programs
that coordinate the invocation of distributed web services
Late binding of service in-
stances, in conjunction with load balancing strategies
Offer alreadz self * properties
Transactional garantees. the system is completely
resilient to temporary node failures.
Also, thanks to late binding, permanent failures of nodes participating to the execution of a process instance, but not involved in a computation at the moment of failure, do not affect the execution.
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
Replacement node found (late binding)
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
the node migrates the control for process execution to one or more successor nodes by delivering an
activation token containing flow-control information and the whiteboard
Hardware, network or service failures
If the node becomes temporarily disconnected from the network, the system is still able to recover.
node will keep retrying to pass on the results until it succeeds
Works very well in controlled environnments
present on-going work aimed at improv-
ing OSIRIS&apos; fault tolerance capabilities
WN assigned to Shepherds (herds)
Shepherds organized in pools
Leader
Shepherds in the pool share state
Persistence of process state
Triggering of process activity
Leader of a pool communicates to a WN an activation key Ki
Using Ki, WN gets the porcess state form SML
WN writes the next process activity with a new key Ki+1 to SML
WN sends the new activation key Ki+1 to the assigned pool of Shepherds
Leader of the pool forwards the activation key to another pool of shepherds
Another step that deltes entires from the shm
Unique activation key provides indenpendance of process activities
Temporarily failed WNs that have been replaced are terminated
The side-effects created by B that are not
stored on the shared memory cannot be undone
DHT-like structured overlay
Paxos commit protocol
consistent information about the state of the activity it is supervising. Distributed transaction
DHT fault detection mechanism to elect an
appropriate shepherd replacement replica
Beernet DHT implementation
with respect to the migration algorithm, only a passive role
Routing mechanism
Failure-detection mechanism
their state relative to the execution of the migration algorithm
We use it to are assigned nodes to the herd of a shepherd
several shepherds coordinate to form a pool
how leader election within a pool proceeds
Communication between a worker node and a pool of shepherds
Shepherds are phisical nodes and Wns the reource to be stored
Worker node ids in the circle lying inbetween 2 shepherd ids become the herd of the adjacent shepherd
take into consideration
other factors to improve porcess execution
as explained above is sufficient to guarantee the correctness of the routing and enable
Late-binding.
Aggregate load
present on-going work aimed at improv-
ing OSIRIS&apos; fault tolerance capabilities