SlideShare uma empresa Scribd logo
1 de 32

Advanced Transaction Management
Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6
9:00 Intro &
terminology
TP mons
& ORBs
Logging &
res. Mgr.
Files &
Buffer Mgr.
Structured
files
11:00 Reliability Locking
theory
Res. Mgr. &
Trans. Mgr.
COM+ Access paths
13:30 Fault
tolerance
Locking
techniques
CICS & TP
& Internet
CORBA/
EJB + TP
Groupware
15:30 Transaction
models
Queueing Advanced
Trans. Mgr.
Replication Performance
& TPC
18:00 Reception Workflow Cyberbricks Party FREE
Chapter 13

Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of
Commit
Optimizing Commit
Disaster Protection via Data/Application
Replication

Mixing Transaction Managers
Four standards: LU 6.2 ~ APPC ~ CPIC ~ CICS: de
facto TP standard
X/Open + OSI/TP : The de jure TP standard.
OTS: The CORBA standard
TIP: De facto interoperability standard
Almost everyone interoperates with LU6.2
LU6.2 has evolved to have presumed abort, not reuse
aborted trids, .. other fixes
LU6.2 is "open" two phase commit, documented
interface, reconnection / resolve is documented.
Internally, everyone uses private protocols with many
tricks.

Mixing "OLD" Transaction Managers
Many old TP monitors are not open:
Do not expose 2PC (prepare() and commit())
=> insist on being root commit coordinator.
All will become X/Open-compliant eventually and thus
be open TP monitors.
If stuck with an "closed" TM:
Can still get atomicity if:
1. Only one closed TM involved.
2. TM is direct not queued

Mixing with a Closed Transaction Manager
All "open" TMs and RMs prepared, closed TM does "RUMP"
deferred_update(int id, complex_type list_of_updates) /* rump logic */
{Begin_Work(); /* start a new transaction */
select count(*) from done where id = :id; /* test if work was done */
if not found then /* if not done */
do list_of_updates; /* then do the list of updates.*/
insert into done values (:id); /* flag transaction done */
Commit_Work(); /* commit update and flag */
acknowledge; /* reply success to caller */
} /* in both cases. */
Status_Transaction(TRID trid)
{ select count(*) into :ans from done where trid = :trid; return ans:}
Transaction Gateway
to Closed Transaction Mgr
If Not duplicate
Do transaction
Insert trid in done table
Commit
Acknowledge
Do Transaction
While not acknowledge
Send trid + data
Wait
Done Table

Mixing Open Transaction Managers
Gateway translates between external and internal TRID.
Gateway translates between external and internal protocols
Participates in transaction resolution (is a TM in both worlds)
Local Protocol
Transaction Gateway
OSI Protocol Stack
"Foreign"
Transaction
Managers
"Our"
Transaction
Manager
his trid our trid
Trid Map Table

Mixing Open Transaction Managers
Multiple entry problem:
TRID enters system twice at two different paths.
"works" but looks like two separate transactions.
commit dependency is external to system.
Fancy option problem:
External/internal TM has an option the other does not.
Fakes (or turn off) optimizations/options not supported
by one side or the other

Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication

Non-Blocking Commit
The problem: what if the coordinator fails.
Solutions: 1. wait
2. appoint a new coordinator
Appointment can be thought of as a process pair (n-plex)
Works great in a cluster (no communications failures).
P r im a r y B a c k u p P a r t ic ip a n t s
P r e p a r e ( + lis t o f p a r tic ip a n ts a n d s e s s io n s )
a c k
P r e p a r e
P r e p a r e d
C o m m it
a c k
C o m m it
C o m m itte d
W r ite C o m m it L o g R e c o r d
L o g
C o m p le te
a c k
W r ite " C o m p le te " L o g R e c o r d
P r o c e s s P a ir

Non-Blocking Commit in a WAN:
3ϕ or Heuristic or Operator Command
Wide area net can partition
Process pairs cannot reliably decide to take over.
Solution(s):
1. Three phase protocol
Broadcast participant list and decision as part of phase
1.5; let (majority) of participants decide if coordinator
fails.
2. Heuristic decisions
Default to commit/abort.
Announce Heuristic Mismatch at reconnect if wrong
guess
3. Human decision
Announce Operator Mismatch at reconnect if wrong
guess.

Transfer of Commit
What if a participant
is more secure than the coordinator?
is more reliable than the coordinator?
Is faster than the coordinator?
Transfer commit authority to him?
Gas Pump
LA Bank
VisaSF Bank
Gas Pump
LA Bank
VisaSF Bank

Transfer of Commit
Is also an optimization:
saves messages if done as part of commit.
called nested commit protocol
or last resource manager optimization
2 messages vs 5 messages (plus one lazy msg)
Begin
Dequeue
Prepare
doit
Enqueue
Commit_Work()Phase 2 Commit
Begin
Dequeue
doit
Enqueue
Phase 2 Commit
Commit
Prepare
No Transfer of Commit Transfer of Commit
complete
complete
Commit_Work()
work request
work request
+ You are Root!

Transfer of Commit: More Complex Case
More complex if the root has more than one branch:
Need to set up new sessions among "trusted" nodes
root sends new root name to all participants at phase 1
Lybia
US
Deutschland

Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication

Optimizing Commit
Can optimize:
Delay: milliseconds/commit
Message cost: number, size, urgency of messages
IO cost: number, size, or urgency of IO
CPU cost: cycles used
Throughput: maximum commit rate.

Commit: the General Case
Prepare(): 1 rpc or message pair per RM
and one per non-root TM
1 forced IO per RM (prepare record)
1 forced IO per TM(commit record)
Commit(): The same.
Summary of 2PC cost:
IO: 2(RM+TM)
RPCs: 2(RM+(TM-1))
Messages: 4(RM+(TM-1)) (equivalent to RPCs)
Delay: 2IO ~ 50ms ~ 10Kins.
4 msg ~ 20ms ~ 50Kins
50ms*(RM+TM) + 20ms*(RM+TM-1)
These are the error-free counts (i.e. the minimum values)

Commit: Simple Optimizations
Presumed abort saves a TM IO (implicit in protocol above)
Do phase 1, phase2 in parallel (saves delay)
Common log (saves RM log forces)
IO: 2(TM)
Messages: 4(RM+TM-1) (equivalent to RPCs)
Delay: 2*IO*TM + 4*M*(RM+TM-1)
~50ms*TM+40ms*(RM+TM-1)
Use Local RPC (10x faster)
~50ms*TM + RM+40ms*(TM-1)
Use WADS for low IO latency(3ms vs 25ms)
~ 6ms*TM + RM + 40ms*(TM-1)
Simple case of 1 TM 2 RM:
~ 8ms delay for a commit.

Group Commit Optimization
Amortizes IO and messages across several transactions
Adds delay
If N transactions in a group:
IO, Message cost per transaction is ~ 1/N
Small extra delay if one slow step in original path.
As system heats up (commit rate rises) to 25tps
start to install group commit with a 30ms threshold
(at 100tps: 3.3 trans/group).

Simple Commit Optimizations
Read-only: just get phase1 call to release locks.
Note: may violate ACID, should release read locks
at phase 2 if any locks acquired during phase 1.
Saves messages (Phase 2) and IO (no RM IO).
True read-only transaction must prepare at phase 1
unlock at phase 2.
Unjoin: RM does no work at commit/abort.
Lazy: user-requested group commit. Piggybacks on others.
no extra IO or messages.

Transaction Commit Trees
one node deep bush general
case
share log transfer Parallel Parallel
LRPC commit transfer transfer
.
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM

Transfer of COMMIT: Linear COMMIT
Parent and other sub-trees prepare
then transfer commit authority to remaining child.
Last in chain becomes commit coordinator.
More delay, fewer messages
For N=2, Same delay, 3 vs 4 messages.
Always use it.
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM

Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication

Disaster Recovery at a Remote Site
Replicate Data
Applications
Network connection at 2 (or more sites)
Symmetric design:
Either site can process transactions
Asymmetric design:
One site is master of each data item.
Allows: Caching
Batching of updates at backup
So far, asymmetric design is most popular.
To get symmetry, have each node master 1/2 of the db/net.

Sample Physical LOG RECORD
Basic idea of asymmetric design:
send log from primary to backup
backup applies log to its copy
backup is in constant media recovery
backup processes/sessions/data ready to take over
Client
Primary Backuplog
Session
System Pair
Clients
Primary Backuplog
Symmetric:
Two System
Pairs
System Pairs
Basic Idea
PrimaryBackup log
Primary
Hub:
Central Site Backs
up
Several Primaries
client Client
Primary
Backup
log &
archive
dumps
Vault:
Backup stores Log
and
Archive Dumps
client
Backup
Primary Primary
client

Sample Physical LOG RECORD
Need some way to decide failure.
Easy in a cluster
Hard in a WAN (partition possible)
Solutions: Extra wires
Wires on demand (dialup)
Human (operator)
Quorum device.
Kind of log?
Logical log is best
loose coupling (allows backup to be a different TM/RM
failure independence (different from physiological log)

Takeover Logic
/* initialization */
Tell primary I'm here
Setup all RMs and application processes
Open all initial sessions to clients.
/* the main backup loop */
While (not primary) {redo log} /* the main backup loop */
/* Takeover */
redo rest of log
resend most recent message on each session
abort any incomplete transactions
/* Become Primary */
tell application processes to start accepting requests.

Session Takeover
Just like process pairs
Session sequence numbers eliminate duplicates
So, get at-least-once delivery: resend msg at takeover
Primary Backup
Network Switches Clients
OSI, SNA,TCP/IP, X..25,etc
Primary Backup
Front Ends Switch Clients
OSI, SNA,TCP/IP, X..25,etc

Catch-up After Failure
Failed node at restart executes normal restart
Then enters backup logic.
If both fail, outside observer must say who is best
backup has to match its log to new primary.
Design issue: are nodes bit-for-bit identical?
If so, backup must “trim” log to match primary.

How Safe?
1-SAFE: no extra delay, risks lost transactions
2-SAFE: extra delay (if backup up),
single fault tolerant, high availability
VERY-SAFE: extra delay, no lost transactions
low availability
client
commitcommit
ok
client
commitcommit
client
commit
commit
ok
client
out of
service
client
commit
commit
ok
client
commitcommit
primary backup primary backup
Both Up Primary Up, Backup Down
1-Safe
2-Safe
Very Safe

System Pairs vs Replicated Data
System pairs replicate the application
DB
application processes
sessions
Data replicators only replicate data.
Other aspects left as an exercise for the
application designer.

System Pair Benefits
Tolerates faults
Hardware
Environment
Operations
Heisenbugs
Can replace software/hardware online
Can move backup to new building or...
Allows design diversity: backup can be completely different
S te p 1 : B o th s y s te m s a r e r u n n in g v e r s io n V 1 . S te p 2 : B a c k u p is c o ld - lo a d e d a s v e r s io n V 2 .
S te p 3 : S W I T C H to B a c k u p . S te p 4 : B a c k u p is c o ld - lo a d e d a s v e r s io n V 2
P r i m a r y
V 1
B a c k u p
V 1
P r i m a r y
V 1
B a c k u p
V 2
V 1
B a c k u p
V 2
P r i m a r y
V 2
B a c k u p
V 2
P r i m a r y

Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application
Replication

Mais conteúdo relacionado

Mais procurados

Inter process communication
Inter process communicationInter process communication
Inter process communication
Pradeep Kumar TS
 
Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)
Sri Prasanna
 
What is the difference between udp and tcp internet protocols
What is the difference between udp and tcp internet protocols What is the difference between udp and tcp internet protocols
What is the difference between udp and tcp internet protocols
krupalipandya29
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure Call
Abdelrahman Al-Ogail
 

Mais procurados (19)

Remote Procedure Call in Distributed System
Remote Procedure Call in Distributed SystemRemote Procedure Call in Distributed System
Remote Procedure Call in Distributed System
 
Parallel computing(1)
Parallel computing(1)Parallel computing(1)
Parallel computing(1)
 
Rpc
RpcRpc
Rpc
 
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
 
Transport layer protocol
Transport layer protocolTransport layer protocol
Transport layer protocol
 
Inter process communication
Inter process communicationInter process communication
Inter process communication
 
TCP/IP 3-way Handshake
TCP/IP 3-way Handshake TCP/IP 3-way Handshake
TCP/IP 3-way Handshake
 
Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)
 
message passing
 message passing message passing
message passing
 
What is the difference between udp and tcp internet protocols
What is the difference between udp and tcp internet protocols What is the difference between udp and tcp internet protocols
What is the difference between udp and tcp internet protocols
 
Chapter 5 pc
Chapter 5 pcChapter 5 pc
Chapter 5 pc
 
Distributed System
Distributed System Distributed System
Distributed System
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure Call
 
Transport layer
Transport layerTransport layer
Transport layer
 
TCP Theory
TCP TheoryTCP Theory
TCP Theory
 
Ch03
Ch03Ch03
Ch03
 
5. Distributed Operating Systems
5. Distributed Operating Systems5. Distributed Operating Systems
5. Distributed Operating Systems
 
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with DebuggingPART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
 
TFTP
TFTPTFTP
TFTP
 

Destaque (10)

7 concurrency controltwo
7 concurrency controltwo7 concurrency controltwo
7 concurrency controltwo
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
06 07 lock
06 07 lock06 07 lock
06 07 lock
 
09 workflow
09 workflow09 workflow
09 workflow
 
8 application servers_v2
8 application servers_v28 application servers_v2
8 application servers_v2
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
11 tm
11 tm11 tm
11 tm
 
10 replication
10 replication10 replication
10 replication
 
02 fault tolerance
02 fault tolerance02 fault tolerance
02 fault tolerance
 
6 two phasecommit
6 two phasecommit6 two phasecommit
6 two phasecommit
 

Semelhante a 13 tm adv

Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
Sri Prasanna
 
Comparing Cpp And Erlang For Motorola Telecoms Software
Comparing Cpp And Erlang For Motorola Telecoms SoftwareComparing Cpp And Erlang For Motorola Telecoms Software
Comparing Cpp And Erlang For Motorola Telecoms Software
l xf
 
5th KuVS Meeting
5th KuVS Meeting5th KuVS Meeting
5th KuVS Meeting
steccami
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OS
C.U
 

Semelhante a 13 tm adv (20)

Ccna Imp Guide
Ccna Imp GuideCcna Imp Guide
Ccna Imp Guide
 
FEC & File Multicast
FEC & File MulticastFEC & File Multicast
FEC & File Multicast
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
Lec9
Lec9Lec9
Lec9
 
Comparing Cpp And Erlang For Motorola Telecoms Software
Comparing Cpp And Erlang For Motorola Telecoms SoftwareComparing Cpp And Erlang For Motorola Telecoms Software
Comparing Cpp And Erlang For Motorola Telecoms Software
 
5th KuVS Meeting
5th KuVS Meeting5th KuVS Meeting
5th KuVS Meeting
 
Data race
Data raceData race
Data race
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OS
 
OS_Ch17
OS_Ch17OS_Ch17
OS_Ch17
 
OSCh17
OSCh17OSCh17
OSCh17
 
2.communcation in distributed system
2.communcation in distributed system2.communcation in distributed system
2.communcation in distributed system
 
computer network NCC l4dc assingment
computer network NCC l4dc assingment computer network NCC l4dc assingment
computer network NCC l4dc assingment
 
PPT.pdf
PPT.pdfPPT.pdf
PPT.pdf
 
Communication Mechanisms, Past, Present & Future
Communication Mechanisms, Past, Present & FutureCommunication Mechanisms, Past, Present & Future
Communication Mechanisms, Past, Present & Future
 
Ipc
IpcIpc
Ipc
 
Physical And Data Link Layers
Physical And Data Link LayersPhysical And Data Link Layers
Physical And Data Link Layers
 
Chapter 4 communication2
Chapter 4 communication2Chapter 4 communication2
Chapter 4 communication2
 
Jaimin chp-6 - transport layer- 2011 batch
Jaimin   chp-6 - transport layer- 2011 batchJaimin   chp-6 - transport layer- 2011 batch
Jaimin chp-6 - transport layer- 2011 batch
 

Mais de ashish61_scs (20)

7 concurrency controltwo
7 concurrency controltwo7 concurrency controltwo
7 concurrency controltwo
 
Transactions
TransactionsTransactions
Transactions
 
22 levine
22 levine22 levine
22 levine
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
20 access paths
20 access paths20 access paths
20 access paths
 
19 structured files
19 structured files19 structured files
19 structured files
 
17 wics99 harkey
17 wics99 harkey17 wics99 harkey
17 wics99 harkey
 
16 greg hope_com_wics
16 greg hope_com_wics16 greg hope_com_wics
16 greg hope_com_wics
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wics
 
10b rm
10b rm10b rm
10b rm
 
10a log
10a log10a log
10a log
 
08 message and_queues_dieter_gawlick
08 message and_queues_dieter_gawlick08 message and_queues_dieter_gawlick
08 message and_queues_dieter_gawlick
 
05 tp mon_orbs
05 tp mon_orbs05 tp mon_orbs
05 tp mon_orbs
 
04 transaction models
04 transaction models04 transaction models
04 transaction models
 
03 fault model
03 fault model03 fault model
03 fault model
 
01 whirlwind tour
01 whirlwind tour01 whirlwind tour
01 whirlwind tour
 
Solution5.2012
Solution5.2012Solution5.2012
Solution5.2012
 
Solution6.2012
Solution6.2012Solution6.2012
Solution6.2012
 
Solution7.2012
Solution7.2012Solution7.2012
Solution7.2012
 

Último

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

13 tm adv

  • 1.  Advanced Transaction Management Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro & terminology TP mons & ORBs Logging & res. Mgr. Files & Buffer Mgr. Structured files 11:00 Reliability Locking theory Res. Mgr. & Trans. Mgr. COM+ Access paths 13:30 Fault tolerance Locking techniques CICS & TP & Internet CORBA/ EJB + TP Groupware 15:30 Transaction models Queueing Advanced Trans. Mgr. Replication Performance & TPC 18:00 Reception Workflow Cyberbricks Party FREE Chapter 13
  • 2.  Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication
  • 3.  Mixing Transaction Managers Four standards: LU 6.2 ~ APPC ~ CPIC ~ CICS: de facto TP standard X/Open + OSI/TP : The de jure TP standard. OTS: The CORBA standard TIP: De facto interoperability standard Almost everyone interoperates with LU6.2 LU6.2 has evolved to have presumed abort, not reuse aborted trids, .. other fixes LU6.2 is "open" two phase commit, documented interface, reconnection / resolve is documented. Internally, everyone uses private protocols with many tricks.
  • 4.  Mixing "OLD" Transaction Managers Many old TP monitors are not open: Do not expose 2PC (prepare() and commit()) => insist on being root commit coordinator. All will become X/Open-compliant eventually and thus be open TP monitors. If stuck with an "closed" TM: Can still get atomicity if: 1. Only one closed TM involved. 2. TM is direct not queued
  • 5.  Mixing with a Closed Transaction Manager All "open" TMs and RMs prepared, closed TM does "RUMP" deferred_update(int id, complex_type list_of_updates) /* rump logic */ {Begin_Work(); /* start a new transaction */ select count(*) from done where id = :id; /* test if work was done */ if not found then /* if not done */ do list_of_updates; /* then do the list of updates.*/ insert into done values (:id); /* flag transaction done */ Commit_Work(); /* commit update and flag */ acknowledge; /* reply success to caller */ } /* in both cases. */ Status_Transaction(TRID trid) { select count(*) into :ans from done where trid = :trid; return ans:} Transaction Gateway to Closed Transaction Mgr If Not duplicate Do transaction Insert trid in done table Commit Acknowledge Do Transaction While not acknowledge Send trid + data Wait Done Table
  • 6.  Mixing Open Transaction Managers Gateway translates between external and internal TRID. Gateway translates between external and internal protocols Participates in transaction resolution (is a TM in both worlds) Local Protocol Transaction Gateway OSI Protocol Stack "Foreign" Transaction Managers "Our" Transaction Manager his trid our trid Trid Map Table
  • 7.  Mixing Open Transaction Managers Multiple entry problem: TRID enters system twice at two different paths. "works" but looks like two separate transactions. commit dependency is external to system. Fancy option problem: External/internal TM has an option the other does not. Fakes (or turn off) optimizations/options not supported by one side or the other
  • 8.  Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication
  • 9.  Non-Blocking Commit The problem: what if the coordinator fails. Solutions: 1. wait 2. appoint a new coordinator Appointment can be thought of as a process pair (n-plex) Works great in a cluster (no communications failures). P r im a r y B a c k u p P a r t ic ip a n t s P r e p a r e ( + lis t o f p a r tic ip a n ts a n d s e s s io n s ) a c k P r e p a r e P r e p a r e d C o m m it a c k C o m m it C o m m itte d W r ite C o m m it L o g R e c o r d L o g C o m p le te a c k W r ite " C o m p le te " L o g R e c o r d P r o c e s s P a ir
  • 10.  Non-Blocking Commit in a WAN: 3ϕ or Heuristic or Operator Command Wide area net can partition Process pairs cannot reliably decide to take over. Solution(s): 1. Three phase protocol Broadcast participant list and decision as part of phase 1.5; let (majority) of participants decide if coordinator fails. 2. Heuristic decisions Default to commit/abort. Announce Heuristic Mismatch at reconnect if wrong guess 3. Human decision Announce Operator Mismatch at reconnect if wrong guess.
  • 11.  Transfer of Commit What if a participant is more secure than the coordinator? is more reliable than the coordinator? Is faster than the coordinator? Transfer commit authority to him? Gas Pump LA Bank VisaSF Bank Gas Pump LA Bank VisaSF Bank
  • 12.  Transfer of Commit Is also an optimization: saves messages if done as part of commit. called nested commit protocol or last resource manager optimization 2 messages vs 5 messages (plus one lazy msg) Begin Dequeue Prepare doit Enqueue Commit_Work()Phase 2 Commit Begin Dequeue doit Enqueue Phase 2 Commit Commit Prepare No Transfer of Commit Transfer of Commit complete complete Commit_Work() work request work request + You are Root!
  • 13.  Transfer of Commit: More Complex Case More complex if the root has more than one branch: Need to set up new sessions among "trusted" nodes root sends new root name to all participants at phase 1 Lybia US Deutschland
  • 14.  Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication
  • 15.  Optimizing Commit Can optimize: Delay: milliseconds/commit Message cost: number, size, urgency of messages IO cost: number, size, or urgency of IO CPU cost: cycles used Throughput: maximum commit rate.
  • 16.  Commit: the General Case Prepare(): 1 rpc or message pair per RM and one per non-root TM 1 forced IO per RM (prepare record) 1 forced IO per TM(commit record) Commit(): The same. Summary of 2PC cost: IO: 2(RM+TM) RPCs: 2(RM+(TM-1)) Messages: 4(RM+(TM-1)) (equivalent to RPCs) Delay: 2IO ~ 50ms ~ 10Kins. 4 msg ~ 20ms ~ 50Kins 50ms*(RM+TM) + 20ms*(RM+TM-1) These are the error-free counts (i.e. the minimum values)
  • 17.  Commit: Simple Optimizations Presumed abort saves a TM IO (implicit in protocol above) Do phase 1, phase2 in parallel (saves delay) Common log (saves RM log forces) IO: 2(TM) Messages: 4(RM+TM-1) (equivalent to RPCs) Delay: 2*IO*TM + 4*M*(RM+TM-1) ~50ms*TM+40ms*(RM+TM-1) Use Local RPC (10x faster) ~50ms*TM + RM+40ms*(TM-1) Use WADS for low IO latency(3ms vs 25ms) ~ 6ms*TM + RM + 40ms*(TM-1) Simple case of 1 TM 2 RM: ~ 8ms delay for a commit.
  • 18.  Group Commit Optimization Amortizes IO and messages across several transactions Adds delay If N transactions in a group: IO, Message cost per transaction is ~ 1/N Small extra delay if one slow step in original path. As system heats up (commit rate rises) to 25tps start to install group commit with a 30ms threshold (at 100tps: 3.3 trans/group).
  • 19.  Simple Commit Optimizations Read-only: just get phase1 call to release locks. Note: may violate ACID, should release read locks at phase 2 if any locks acquired during phase 1. Saves messages (Phase 2) and IO (no RM IO). True read-only transaction must prepare at phase 1 unlock at phase 2. Unjoin: RM does no work at commit/abort. Lazy: user-requested group commit. Piggybacks on others. no extra IO or messages.
  • 20.  Transaction Commit Trees one node deep bush general case share log transfer Parallel Parallel LRPC commit transfer transfer . TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM
  • 21.  Transfer of COMMIT: Linear COMMIT Parent and other sub-trees prepare then transfer commit authority to remaining child. Last in chain becomes commit coordinator. More delay, fewer messages For N=2, Same delay, 3 vs 4 messages. Always use it. TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM TM RM
  • 22.  Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication
  • 23.  Disaster Recovery at a Remote Site Replicate Data Applications Network connection at 2 (or more sites) Symmetric design: Either site can process transactions Asymmetric design: One site is master of each data item. Allows: Caching Batching of updates at backup So far, asymmetric design is most popular. To get symmetry, have each node master 1/2 of the db/net.
  • 24.  Sample Physical LOG RECORD Basic idea of asymmetric design: send log from primary to backup backup applies log to its copy backup is in constant media recovery backup processes/sessions/data ready to take over Client Primary Backuplog Session System Pair Clients Primary Backuplog Symmetric: Two System Pairs System Pairs Basic Idea PrimaryBackup log Primary Hub: Central Site Backs up Several Primaries client Client Primary Backup log & archive dumps Vault: Backup stores Log and Archive Dumps client Backup Primary Primary client
  • 25.  Sample Physical LOG RECORD Need some way to decide failure. Easy in a cluster Hard in a WAN (partition possible) Solutions: Extra wires Wires on demand (dialup) Human (operator) Quorum device. Kind of log? Logical log is best loose coupling (allows backup to be a different TM/RM failure independence (different from physiological log)
  • 26.  Takeover Logic /* initialization */ Tell primary I'm here Setup all RMs and application processes Open all initial sessions to clients. /* the main backup loop */ While (not primary) {redo log} /* the main backup loop */ /* Takeover */ redo rest of log resend most recent message on each session abort any incomplete transactions /* Become Primary */ tell application processes to start accepting requests.
  • 27.  Session Takeover Just like process pairs Session sequence numbers eliminate duplicates So, get at-least-once delivery: resend msg at takeover Primary Backup Network Switches Clients OSI, SNA,TCP/IP, X..25,etc Primary Backup Front Ends Switch Clients OSI, SNA,TCP/IP, X..25,etc
  • 28.  Catch-up After Failure Failed node at restart executes normal restart Then enters backup logic. If both fail, outside observer must say who is best backup has to match its log to new primary. Design issue: are nodes bit-for-bit identical? If so, backup must “trim” log to match primary.
  • 29.  How Safe? 1-SAFE: no extra delay, risks lost transactions 2-SAFE: extra delay (if backup up), single fault tolerant, high availability VERY-SAFE: extra delay, no lost transactions low availability client commitcommit ok client commitcommit client commit commit ok client out of service client commit commit ok client commitcommit primary backup primary backup Both Up Primary Up, Backup Down 1-Safe 2-Safe Very Safe
  • 30.  System Pairs vs Replicated Data System pairs replicate the application DB application processes sessions Data replicators only replicate data. Other aspects left as an exercise for the application designer.
  • 31.  System Pair Benefits Tolerates faults Hardware Environment Operations Heisenbugs Can replace software/hardware online Can move backup to new building or... Allows design diversity: backup can be completely different S te p 1 : B o th s y s te m s a r e r u n n in g v e r s io n V 1 . S te p 2 : B a c k u p is c o ld - lo a d e d a s v e r s io n V 2 . S te p 3 : S W I T C H to B a c k u p . S te p 4 : B a c k u p is c o ld - lo a d e d a s v e r s io n V 2 P r i m a r y V 1 B a c k u p V 1 P r i m a r y V 1 B a c k u p V 2 V 1 B a c k u p V 2 P r i m a r y V 2 B a c k u p V 2 P r i m a r y
  • 32.  Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication