Logging Last Resource Optimization for Distributed Transactions in Oracle Weblogic Server
1. Logging Last Resource Optimization
for Distributed Transactions in
Oracle WebLogic Server
T. Barnes, A. Messinger, P. Parkinson, A. Ganesh,
G. Shegalov, S. Narayan, S. Kareenhalli
2. OLTP: Online Transaction Processing
Transaction is an ACID contract
● Atomic – all or nothing
● Consistent – from the application perspective
● Isolated – masked concurrency through locking or snapshots
● Durable – once committed changes survive subsequent failures
begin
c -= 1000
Checking = 2000 s += 1000 Checking = 1000
Savings = 8000 commit Savings = 9000
time
3. OLTP: Single Resource
● A and D are typically implemented using Write-Ahead Logging
● Transaction recovery is “simple”: REDO phase, UNDO phase.
BEGIN TRANSACTION
/* LSN = 1: log for undo and redo in MM buffer*/
UPDATE Accounts SET balance = balance – 1000 WHERE Number = 1
/* LSN = 2: log for undo and redo in MM buffer*/
UPDATE Accounts SET balance = balance + 1000 WHERE Number = 2
/* LSN = 3: log commit and force (5-6 orders slower)*/
COMMIT TRANSACTION
Accounts LSN=0 Accounts LSN=1 Accounts LSN=2
1 2000 1 1000 1 1000
2 8000 2 8000 2 9000
4. OLTP: Distributed / Two-Phase Commit
Like a wedding ceremony
Coordinator: Will you ...? (prepare)
Resource: I will (OK)
Coordinator: I pronounce you … (commit)
Transaction Resource 1 Resource 1
Coordinator
prepare --> force-log prepare force-log prepare
<-- OK <-- OK
commit --> force-log commit force-log commit
<-- ACK <-- ACK
5. 2PC is A CI D
● 2PC is not about Concurrency Control.
● 2PC transaction is therefore
○ Globally Atomic
○ Locally Isolated
○ Locally Consistent
○ Globally Durable
6. OLTP: Queued Transactions
client` app server database
begin transaction
req_q.enqueue(req1)
commit transaction
begin transaction
creq = req_q.dequeue()
resp = creq.execute()
res_q.enqueue(resp)
commit transaction
begin transaction
resp = res_q.dequeue()
process(resp)
commit transaction
12. “Real Life” XA 2PC
2n+1 writes, 8n messages
TM Resources
xa_start
ack_started
xa_end
ack_ended
Timeline
xa_prepare
force-log prepared
ack_prepared
all-prepared: force-log commit
xa_commit
force-log commit
ack_committed
all-commit: log end
13. Standard 2PC Optimizations
● 1PC: if only one resource enlisted, prepare skipped
● Read-Only: if voted read-only, commit skipped
● XA ceremony of xa_(start|end) is always present
14. Nested 2PC: Coordinator Role Transfer
[Gray’78]
prepare p commit
TC Res2 Res3 c
commit commit
c
● Last Resource is committed in one phase
● 2n messages/ 2n-1 forced writes
● Known topology: linked Databases
15. WebLogic Design Constraints and Goals
● No control over foreign XAResource, TM and topology
● Broadband: minimize blocking RPC, not messages
● Unneeded XA on Res3: save xa_start, xa_end
16. Typical WLS Deployment
● JMS and TM share the same FileStore
● Collocated JMS connection cost is negligible
● JDBC Datasource is remote: blocking RPC
● DB internal resources (locks, latches, etc.) are more
expensive and JEE is not a single client
● Outbound JMS notifies about a JDBC update
● Ideally: JDBC updates visible before JMS updates
17. JDBC as Logging Last Resource
● User enables a non-XA JDBC Datasource as LLR
○ LLR table WL_LLR_<server> in the DS schema
○ No XA overhead for the LLR
● TM log is local log UNION LLR table log
○ WLS does not boot if any LLR table is unavailable
● Restriction: 1 LLR datasource / Transaction
● No coordinator transfers as in Nested 2PC
18. XA 2PC Commit with LL Resource
1. Prepare concurrently all non-LLR XAResources
2. Insert XID into the LLR table
3. Commit the LLR-Resource
4. If 3 is successful, commit non-LLR XAResources
5. Lazy garbage-collection of 2PC records of completed
transactions is piggybacked on future LLR transactions
19. LLR Failure Recovery
● Failure before LLR.commit() => global abort
● Failure during LLR.commit() => similar to media failure
○ Wait until LLR Datasource / table is available for read
○ Presence of the LLR commit log decides the global outcome
○ If unavailable for AbandonTimeoutSeconds log abandoned
● JVM/OS crash: TM scan local log UNION LLR
○ Usual transaction outcome resolution
● 2PC recovery guarantees are not compromised
20. LLR Savings
Back-of-the-envelope for the single-threaded case
with Jeff Dean’s numbers [Google key notes]:
● xa_start (RPC),
● xa_end (RPC),
● xa_prepare (RPC + force-log)
● Insert into LLR table + commit done via single RPC
------------------------------------------------
4xRTT + 1xDiskSeek
= 4x500,000ns + 10,000,000ns = 12 milliseconds
29. WebLogic FileStore
● XA-capable KV store on local file system
● Mime design: allocate under write-head
○ fast writes
○ slow recovery
○ works well up to a couple of GiB
● Transactional use: for JMS messages and JTA logs
● Non-transactional use: Diagnostics and Config