SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Transactions with
Replicated Data
Distributed Software
Systems
One copy serializability
z Replicated transactional service
y Each replica manager provides concurrency control
and recovery of its own data items in the same way
as it would for non-replicated data
z Effects of transactions performed by various
clients on replicated data items are the same as
if they had been performed one at a time on a
single data item
z Additional complications: failures, network
partitions
y Failures should be serialized wrt transactions, i.e. any
failure observed by a transaction must appear to
have happened before a transaction started
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F200
Figure 14.18 Replicated transactional service.
B
A
Client + front end
BB BA A
GetBalance(A)
Client + front end
Replica managers
Replica managers
Deposit(B,3);
UT
Replication Schemes
z Read one –Write All
yCannot handle network partitions
z Schemes that can handle network
partitions
yAvailable copies with validation
yQuorum consensus
yVirtual Partition
Read one/Write All
z One copy serializability
y Each write operation sets a write lock at each replica
manager
y Each read sets a read lock at one replica manager
z Two phase commit
y Two-level nested transaction
xCoordinator -> Workers
xIf either coordinator or worker is a replica manager, it has to
communicate with replica managers
z Primary copy replication
y ALL client requests are directed to a single primary
server
xDifferent from scheme discussed earlier
Available copies replication
z Can handle some replica managers are
unavailable because they have failed or
communication failure
z Reads can be performed by any available replica
manager but writes must be performed by all
available replica managers
z Normal case is like read one/write all
y As long as the set of available replica managers does
not change during a transaction
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F201
Figure 14.19 Available copies.
A
X
Client + front end
P
B
Client + front end
Replica managers
Deposit(A,3);
UT
Deposit(B,3);
GetBalance(B)
GetBalance(A)
Replica managers
Y
M
B
N
A
B
Available copies replication
z Failure case
y One copy serializabilty requires that failures and
recovery be serialized wrt transactions
y This is not achieved when different transactions
make conflicting failure observations
y Example shows local concurrency control not enough
y Additional concurrency control procedure (called local
validation) has to be performed to ensure correctness
z Available copies with local validation assumes no
network partition - i.e. functioning replica
managers can communicate with one another
Local validation - example
z Assume X fails just after T has performed
GetBalance and N fails just after U has
performed GetBalance
z Assume X and N fail before T & U have
performed their Deposit operations
y T’s Deposit will be performed at M & P while U’s
Deposit will be performed at Y
y Concurrency control on A at X does not prevent U
from updating A at Y; similarly concurrency control
on B at N does not prevent Y from updating B at M &
P
y Local concurrency control not enough!
Local validation cont’d
z T has read from an item at X, so X’s
failure must be after T.
z T observes the failure of N, so N’s failure
must be before T
yN fails -> T reads A at X; T writes B at M & P
-> T commits -> X fails
ySimilarly, we can argue:
X fails -> U reads B at N; U writes A at Y ->
U commits -> N fails
Local validation cont’d
z Local validation ensures such incompatible
sequences cannot both occur
z Before a transaction commits it checks for
failures (and recoveries) of replica managers of
data items it has accessed
z In example, if T validates before U, T would
check that N is still unavailable and X,M, P are
available. If so, it can commit
z U’s validation would fail because N has already
failed.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F202
Figure 14.20 Network partition.
Client + front end
B
Withdraw(B, 4)
Client + front end
Replica managers
Deposit(B,3);
UT
Network
partition
B
B B
Handling Network Partitions
z Network partitions separate replica managers
into two or more subgroups, in such a way that
the members of a subgroup can communicate
with one another but members of different
subgroups cannot communicate
z Optimistic approaches
y Available copies with validation
z Pessimistic approaches
y Quorum consensus
Available Copies With
Validation
z Available copies algorithm applied within each
partition
y Maintains availability for Read operations
z When partition is repaired, possibly conflicting
transactions in separate partitions are validated
y The effects of a committed transaction that is now
aborted on validation will have to be undone
xOnly feasible for applications where such compensating
actions can be taken
Available copies with
validation cont’d
z Validation
y Version vectors (Write-Write conflicts)
y Precedence graphs (each partition maintains a log of
data items affected by the Read and Write operations
of transactions
y Log used to construct precedence graph whose
nodes are transactions and whose edges represent
conflicts between Read and Write operations
xNo cycles in graph corresponding to each partition
y If there are cycles in graph, validation fails
Quorum consensus
z A quorum is a subgroup of replica managers
whose size gives it the right to carry out
operations
z Majority voting one instance of a quorum
consensus scheme
y R + W > total number of votes in group
y W > half the total votes
y Ensures that each read quorum intersects a write
quorum, and two write quora will intersect
z Each replica has a version number that is used
to detect if the replica is up to date.
Virtual Partitions scheme
z Combines available copies and quorum
consensus
z Virtual partition = set of replica managers that
have a read and write quorum
z If a virtual partition can be formed, available
copies is used
y Improves performance of Reads
z If a failure occurs, and virtual partition changes
during a transaction, it is aborted
z Have to ensure virtual partitions do not overlap
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F203
Replica managers
Network partition
VX Y Z
TTransaction
Figure 14.21 Two network partitions.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F204
Figure 14.22 Virtual partition.
X V Y Z
Replica managers
Virtual partition Network partition
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F205
Figure 14.23 Two overlapping virtual partitions.
Virtual partition V1 Virtual partition V2
Y X V Z
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F206
Figure 14.24 Creating a virtual partition.
Phase 1:
• The initiator sends a Join request to each potential member. The argument of Join
is a proposed logical timestamp for the new virtual partition;
• When a replica manager receives a Join request it compares the proposed logical
timestamp with that of its current virtual partition;
– If the proposed logical timestamp is greater it agrees to join and replies Yes;
– If it is less, it refuses to join and replies No;
Phase 2:
• If the initiator has received sufficient Yes replies to have Read and Write quora, it
may complete the creation of the new virtual partition by sending a Confirmation
message to the sites that agreed to join. The creation timestamp and list of actual
members are sent as arguments;
• Replica managers receiving the Confirmation message join the new virtual
partition and record its creation timestamp and list of actual members.

Mais conteúdo relacionado

Semelhante a Replicated data

Practical Byzantine Fault Tolernace
Practical Byzantine Fault TolernacePractical Byzantine Fault Tolernace
Practical Byzantine Fault TolernaceYongraeJo
 
19. Distributed Databases in DBMS
19. Distributed Databases in DBMS19. Distributed Databases in DBMS
19. Distributed Databases in DBMSkoolkampus
 
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsDistributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsSachin Chauhan
 
M03 2 Behavioral Diagrams
M03 2 Behavioral DiagramsM03 2 Behavioral Diagrams
M03 2 Behavioral DiagramsDang Tuan
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-queryM Rezaur Rahman
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlbalamurugan.k Kalibalamurugan
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterUlf Wendel
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
20101116讨论会ppt
20101116讨论会ppt20101116讨论会ppt
20101116讨论会pptanyshpm chen
 
Distributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdfDistributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdfBizuayehuDesalegn
 
Two phase commit protocol in dbms
Two phase commit protocol in dbmsTwo phase commit protocol in dbms
Two phase commit protocol in dbmsDilouar Hossain
 
Introduction to transaction processing
Introduction to transaction processingIntroduction to transaction processing
Introduction to transaction processingJafar Nesargi
 
Chapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processingChapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processingJafar Nesargi
 

Semelhante a Replicated data (20)

Practical Byzantine Fault Tolernace
Practical Byzantine Fault TolernacePractical Byzantine Fault Tolernace
Practical Byzantine Fault Tolernace
 
Chapter 13
Chapter 13Chapter 13
Chapter 13
 
19. Distributed Databases in DBMS
19. Distributed Databases in DBMS19. Distributed Databases in DBMS
19. Distributed Databases in DBMS
 
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsDistributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit Protocols
 
M03 2 Behavioral Diagrams
M03 2 Behavioral DiagramsM03 2 Behavioral Diagrams
M03 2 Behavioral Diagrams
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency control
 
DBMS UNIT V.pptx
DBMS UNIT V.pptxDBMS UNIT V.pptx
DBMS UNIT V.pptx
 
09 workflow
09 workflow09 workflow
09 workflow
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
20101116讨论会
20101116讨论会20101116讨论会
20101116讨论会
 
20101116讨论会ppt
20101116讨论会ppt20101116讨论会ppt
20101116讨论会ppt
 
Distributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdfDistributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdf
 
Two phase commit protocol in dbms
Two phase commit protocol in dbmsTwo phase commit protocol in dbms
Two phase commit protocol in dbms
 
slides.07.pptx
slides.07.pptxslides.07.pptx
slides.07.pptx
 
F33 book-depend-pres-pt6
F33 book-depend-pres-pt6F33 book-depend-pres-pt6
F33 book-depend-pres-pt6
 
Introduction to transaction processing
Introduction to transaction processingIntroduction to transaction processing
Introduction to transaction processing
 
Chapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processingChapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processing
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Replicated data

  • 2. One copy serializability z Replicated transactional service y Each replica manager provides concurrency control and recovery of its own data items in the same way as it would for non-replicated data z Effects of transactions performed by various clients on replicated data items are the same as if they had been performed one at a time on a single data item z Additional complications: failures, network partitions y Failures should be serialized wrt transactions, i.e. any failure observed by a transaction must appear to have happened before a transaction started
  • 3. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F200 Figure 14.18 Replicated transactional service. B A Client + front end BB BA A GetBalance(A) Client + front end Replica managers Replica managers Deposit(B,3); UT
  • 4. Replication Schemes z Read one –Write All yCannot handle network partitions z Schemes that can handle network partitions yAvailable copies with validation yQuorum consensus yVirtual Partition
  • 5. Read one/Write All z One copy serializability y Each write operation sets a write lock at each replica manager y Each read sets a read lock at one replica manager z Two phase commit y Two-level nested transaction xCoordinator -> Workers xIf either coordinator or worker is a replica manager, it has to communicate with replica managers z Primary copy replication y ALL client requests are directed to a single primary server xDifferent from scheme discussed earlier
  • 6. Available copies replication z Can handle some replica managers are unavailable because they have failed or communication failure z Reads can be performed by any available replica manager but writes must be performed by all available replica managers z Normal case is like read one/write all y As long as the set of available replica managers does not change during a transaction
  • 7. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F201 Figure 14.19 Available copies. A X Client + front end P B Client + front end Replica managers Deposit(A,3); UT Deposit(B,3); GetBalance(B) GetBalance(A) Replica managers Y M B N A B
  • 8. Available copies replication z Failure case y One copy serializabilty requires that failures and recovery be serialized wrt transactions y This is not achieved when different transactions make conflicting failure observations y Example shows local concurrency control not enough y Additional concurrency control procedure (called local validation) has to be performed to ensure correctness z Available copies with local validation assumes no network partition - i.e. functioning replica managers can communicate with one another
  • 9. Local validation - example z Assume X fails just after T has performed GetBalance and N fails just after U has performed GetBalance z Assume X and N fail before T & U have performed their Deposit operations y T’s Deposit will be performed at M & P while U’s Deposit will be performed at Y y Concurrency control on A at X does not prevent U from updating A at Y; similarly concurrency control on B at N does not prevent Y from updating B at M & P y Local concurrency control not enough!
  • 10. Local validation cont’d z T has read from an item at X, so X’s failure must be after T. z T observes the failure of N, so N’s failure must be before T yN fails -> T reads A at X; T writes B at M & P -> T commits -> X fails ySimilarly, we can argue: X fails -> U reads B at N; U writes A at Y -> U commits -> N fails
  • 11. Local validation cont’d z Local validation ensures such incompatible sequences cannot both occur z Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed z In example, if T validates before U, T would check that N is still unavailable and X,M, P are available. If so, it can commit z U’s validation would fail because N has already failed.
  • 12. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F202 Figure 14.20 Network partition. Client + front end B Withdraw(B, 4) Client + front end Replica managers Deposit(B,3); UT Network partition B B B
  • 13. Handling Network Partitions z Network partitions separate replica managers into two or more subgroups, in such a way that the members of a subgroup can communicate with one another but members of different subgroups cannot communicate z Optimistic approaches y Available copies with validation z Pessimistic approaches y Quorum consensus
  • 14. Available Copies With Validation z Available copies algorithm applied within each partition y Maintains availability for Read operations z When partition is repaired, possibly conflicting transactions in separate partitions are validated y The effects of a committed transaction that is now aborted on validation will have to be undone xOnly feasible for applications where such compensating actions can be taken
  • 15. Available copies with validation cont’d z Validation y Version vectors (Write-Write conflicts) y Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions y Log used to construct precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations xNo cycles in graph corresponding to each partition y If there are cycles in graph, validation fails
  • 16. Quorum consensus z A quorum is a subgroup of replica managers whose size gives it the right to carry out operations z Majority voting one instance of a quorum consensus scheme y R + W > total number of votes in group y W > half the total votes y Ensures that each read quorum intersects a write quorum, and two write quora will intersect z Each replica has a version number that is used to detect if the replica is up to date.
  • 17. Virtual Partitions scheme z Combines available copies and quorum consensus z Virtual partition = set of replica managers that have a read and write quorum z If a virtual partition can be formed, available copies is used y Improves performance of Reads z If a failure occurs, and virtual partition changes during a transaction, it is aborted z Have to ensure virtual partitions do not overlap
  • 18. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F203 Replica managers Network partition VX Y Z TTransaction Figure 14.21 Two network partitions.
  • 19. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F204 Figure 14.22 Virtual partition. X V Y Z Replica managers Virtual partition Network partition
  • 20. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F205 Figure 14.23 Two overlapping virtual partitions. Virtual partition V1 Virtual partition V2 Y X V Z
  • 21. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F206 Figure 14.24 Creating a virtual partition. Phase 1: • The initiator sends a Join request to each potential member. The argument of Join is a proposed logical timestamp for the new virtual partition; • When a replica manager receives a Join request it compares the proposed logical timestamp with that of its current virtual partition; – If the proposed logical timestamp is greater it agrees to join and replies Yes; – If it is less, it refuses to join and replies No; Phase 2: • If the initiator has received sufficient Yes replies to have Read and Write quora, it may complete the creation of the new virtual partition by sending a Confirmation message to the sites that agreed to join. The creation timestamp and list of actual members are sent as arguments; • Replica managers receiving the Confirmation message join the new virtual partition and record its creation timestamp and list of actual members.