SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Logging Last Resource Optimization
  for Distributed Transactions in
      Oracle WebLogic Server

     T. Barnes, A. Messinger, P. Parkinson, A. Ganesh,
          G. Shegalov, S. Narayan, S. Kareenhalli
OLTP: Online Transaction Processing
Transaction is an ACID contract
●   Atomic – all or nothing

●   Consistent – from the application perspective

●   Isolated – masked concurrency through locking or snapshots

●   Durable – once committed changes survive subsequent failures


                              begin
                              c -= 1000
       Checking = 2000        s += 1000         Checking = 1000
        Savings = 8000        commit             Savings = 9000


                                                                 time
OLTP: Single Resource
 ● A and D are typically implemented using Write-Ahead Logging
 ● Transaction recovery is “simple”: REDO phase, UNDO phase.
 BEGIN TRANSACTION
 /* LSN = 1: log for undo and redo in MM buffer*/
 UPDATE Accounts SET balance = balance – 1000 WHERE Number = 1
 /* LSN = 2: log for undo and redo in MM buffer*/
 UPDATE Accounts SET balance = balance + 1000 WHERE Number = 2
 /* LSN = 3: log commit and force (5-6 orders slower)*/
 COMMIT TRANSACTION



 Accounts   LSN=0          Accounts   LSN=1         Accounts LSN=2
 1          2000           1          1000          1            1000
 2          8000           2          8000          2            9000
OLTP: Distributed / Two-Phase Commit
Like a wedding ceremony
Coordinator: Will you ...? (prepare)
Resource: I will (OK)
Coordinator: I pronounce you … (commit)

       Transaction               Resource 1          Resource 1
       Coordinator


       prepare -->          force-log prepare    force-log prepare
                                        <-- OK             <-- OK
       commit -->            force-log commit    force-log commit
                                       <-- ACK            <-- ACK
2PC is A CI D



 ● 2PC is not about Concurrency Control.
 ● 2PC transaction is therefore
    ○ Globally Atomic
    ○ Locally Isolated
    ○ Locally Consistent
    ○ Globally Durable
OLTP: Queued Transactions
client`                          app server       database
begin transaction
req_q.enqueue(req1)
commit transaction
                         begin transaction
                         creq = req_q.dequeue()
                         resp = creq.execute()
                         res_q.enqueue(resp)
                         commit transaction
begin transaction
resp = res_q.dequeue()
process(resp)
commit transaction
WebLogic Server: Java EE++

● App containers: Web (Servlets, WS), EJB, app clients
● Services: Messaging (JMS),
  Transactions (JTA), Database (JDBC), …
Example: Queued Transactions (JEE)
  @MessageDriven(
      mappedName="jms/inboundQueue“
    activationConfig = {@ActivationConfigProperty(
       propertyName = “connectionFactoryJndiName",
       propertyValue = “jms/inboundConnectionFactory"
  )}))
  @TransactionAttribute(TransactionAttributeType.REQUIRED)
  public class OrderMDB implements javax.jms.MessageListener {
    @Resource javax.jms.Queue outboundQueue;
    @Resource javax.jms.ConnectionFactory outboundCf;
    @Resource javax.sql.DataSource orderDataSource;
    public void onMessage(Message message) {
       java.sql.Connection jdbc = orderDataSource.getConnection();
       javax.jms.Connection jms = outboundCf.createConnection();
       // update SQL via JDBC, notify via JMS connections …
    }
  }
“School” Presumed Abort 2PC
                                      2n+1 writes, 4n messages
           TM                                                           Resources


                                                   prepare

                                                             force-log prepared
Timeline




                                             yes
            all-prepared: force-log commit

                                                   commit


                                                              force-log commit
                                         ack
            all-commit: log end
PA2PC (1): TM (Coordinator)
PA2PC (2): Resource (Participant)
“Real Life” XA 2PC
                                          2n+1 writes, 8n messages

           TM                                                        Resources
                                                     xa_start
                                    ack_started


                                                     xa_end
                                    ack_ended
Timeline




                                                  xa_prepare
                                                                force-log prepared
                                  ack_prepared

            all-prepared: force-log commit
                                                  xa_commit
                                                                 force-log commit
                                  ack_committed
            all-commit: log end
Standard 2PC Optimizations




● 1PC: if only one resource enlisted, prepare skipped
● Read-Only: if voted read-only, commit skipped
● XA ceremony of xa_(start|end) is always present
Nested 2PC: Coordinator Role Transfer
                                               [Gray’78]


          prepare      p       commit

    TC                 Res2             Res3       c

           commit              commit
                       c



 ● Last Resource is committed in one phase
 ● 2n messages/ 2n-1 forced writes
 ● Known topology: linked Databases
WebLogic Design Constraints and Goals



 ● No control over foreign XAResource, TM and topology
 ● Broadband: minimize blocking RPC, not messages
 ● Unneeded XA on Res3: save xa_start, xa_end
Typical WLS Deployment

● JMS and TM share the same FileStore
● Collocated JMS connection cost is negligible
● JDBC Datasource is remote: blocking RPC
● DB internal resources (locks, latches, etc.) are more
   expensive and JEE is not a single client
● Outbound JMS notifies about a JDBC update
● Ideally: JDBC updates visible before JMS updates
JDBC as Logging Last Resource

● User enables a non-XA JDBC Datasource as LLR
   ○ LLR table WL_LLR_<server> in the DS schema
   ○ No XA overhead for the LLR
● TM log is local log UNION LLR table log
   ○ WLS does not boot if any LLR table is unavailable
● Restriction: 1 LLR datasource / Transaction
● No coordinator transfers as in Nested 2PC
XA 2PC Commit with LL Resource


1. Prepare concurrently all non-LLR XAResources
2. Insert XID into the LLR table
3. Commit the LLR-Resource
4. If 3 is successful, commit non-LLR XAResources
5. Lazy garbage-collection of 2PC records of completed
   transactions is piggybacked on future LLR transactions
LLR Failure Recovery

 ● Failure before LLR.commit() => global abort
 ● Failure during LLR.commit() => similar to media failure
    ○ Wait until LLR Datasource / table is available for read
    ○ Presence of the LLR commit log decides the global outcome
    ○ If unavailable for AbandonTimeoutSeconds log abandoned
 ● JVM/OS crash: TM scan local log UNION LLR
    ○ Usual transaction outcome resolution
 ● 2PC recovery guarantees are not compromised
LLR Savings

 Back-of-the-envelope for the single-threaded case
 with Jeff Dean’s numbers [Google key notes]:
 ● xa_start (RPC),
 ● xa_end (RPC),
 ● xa_prepare (RPC + force-log)
 ● Insert into LLR table + commit done via single RPC
 ------------------------------------------------
 4xRTT + 1xDiskSeek
   = 4x500,000ns + 10,000,000ns = 12 milliseconds
LLR in DS Wizard: Non-XA Driver
LLR in DS Wizard: Safe unlike
Emulate
Research Workload EAStress2004
                      [SPECjAppServer’04]
EAStress2004 Benchmark HW/Setup

Driver 1                Driver 2                 External Supplier
(3x Dealer,             (3x Dealer,              Emulator
 3x Manufacturing)       3x Manufacturing)
2x Quad Core 3.7Ghz     2x Quad Core 2.7Ghz      2x Quad Core 2.7Ghz
x86_64, 2MB L2, 8GB     x86_64, 2MB L2, 16GB     x86_64, 2MB L2, 16GB
RAM                     RAM                      RAM




                           System Under Test

Oracle WebLogic Server 11gR1         Oracle RDBMS 11gR1 EE
(Middle Tier)                        (Database Tier)
2x Quad Core 2.83Ghz x86_64,         4x Quad Core 2.7Ghz x86_64,
6MB L2, 16 GB RAM                    2MB L2, 64GB RAM
Performance Evaluation (Utilization)
     EAStess2004 v1.08, IR = 700 (not reviewed by SPEC)

MDB scenario
MT = WLS on JrockIt
DB = Oracle Database
Performance Evaluation (Response Time)
      EAStess2004 v1.08, IR = 700 (not reviewed by SPEC)




                  Purchase   Manage      Browse     Manufacturing
 XA               4.20       2.40        5.40       3.75
 LLR              1.50       1.20        1.90       3.00
 improvement      2.8x       2x          2.8x       1.25x
Future Improvements (probably in 12c)


● LLR does not detect Read-Only
● Transaction GUID instead of LLR table for Oracle
Thank You. Questions?




         oracle.com/weblogic
       oracle.com/benchmarks
WebLogic FileStore


● XA-capable KV store on local file system
● Mime design: allocate under write-head
   ○ fast writes
   ○ slow recovery
   ○ works well up to a couple of GiB
● Transactional use: for JMS messages and JTA logs
● Non-transactional use: Diagnostics and Config

Mais conteúdo relacionado

Mais procurados

Investigating the Use of Synchronized Clocks in TCP Congestion Control
Investigating the Use of Synchronized Clocks in TCP Congestion ControlInvestigating the Use of Synchronized Clocks in TCP Congestion Control
Investigating the Use of Synchronized Clocks in TCP Congestion ControlMichele Weigle
 
Paxos building-reliable-system
Paxos building-reliable-systemPaxos building-reliable-system
Paxos building-reliable-systemYanpo Zhang
 
Tungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsTungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
 
SLES 11 SP2 PerformanceEvaluation for Linux on System z
SLES 11 SP2 PerformanceEvaluation for Linux on System zSLES 11 SP2 PerformanceEvaluation for Linux on System z
SLES 11 SP2 PerformanceEvaluation for Linux on System zIBM India Smarter Computing
 
Ibm mq c luster overlap demo script
Ibm mq c luster overlap demo scriptIbm mq c luster overlap demo script
Ibm mq c luster overlap demo scriptsarvank2
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIURohit Jnagal
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingLinaro
 
vSAN Performance and Resiliency at Scale
vSAN Performance and Resiliency at ScalevSAN Performance and Resiliency at Scale
vSAN Performance and Resiliency at ScaleSumit Lahiri
 
Practical Byzantine Fault Tolerance
Practical Byzantine Fault TolerancePractical Byzantine Fault Tolerance
Practical Byzantine Fault ToleranceSuman Karumuri
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Schubert Zhang
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoringScott Miao
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
Ibm mq dqm setups
Ibm mq dqm setupsIbm mq dqm setups
Ibm mq dqm setupssarvank2
 
Istanbul BFT
Istanbul BFTIstanbul BFT
Istanbul BFTYu-Te Lin
 
process management
 process management process management
process managementAshish Kumar
 

Mais procurados (20)

Investigating the Use of Synchronized Clocks in TCP Congestion Control
Investigating the Use of Synchronized Clocks in TCP Congestion ControlInvestigating the Use of Synchronized Clocks in TCP Congestion Control
Investigating the Use of Synchronized Clocks in TCP Congestion Control
 
Paxos building-reliable-system
Paxos building-reliable-systemPaxos building-reliable-system
Paxos building-reliable-system
 
Tungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsTungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten Replicators
 
SLES 11 SP2 PerformanceEvaluation for Linux on System z
SLES 11 SP2 PerformanceEvaluation for Linux on System zSLES 11 SP2 PerformanceEvaluation for Linux on System z
SLES 11 SP2 PerformanceEvaluation for Linux on System z
 
2011.jtr.pbasanta.
2011.jtr.pbasanta.2011.jtr.pbasanta.
2011.jtr.pbasanta.
 
Ibm mq c luster overlap demo script
Ibm mq c luster overlap demo scriptIbm mq c luster overlap demo script
Ibm mq c luster overlap demo script
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
9
99
9
 
vSAN Performance and Resiliency at Scale
vSAN Performance and Resiliency at ScalevSAN Performance and Resiliency at Scale
vSAN Performance and Resiliency at Scale
 
Ch15
Ch15Ch15
Ch15
 
Tr ns802 11
Tr ns802 11Tr ns802 11
Tr ns802 11
 
Practical Byzantine Fault Tolerance
Practical Byzantine Fault TolerancePractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
Ibm mq dqm setups
Ibm mq dqm setupsIbm mq dqm setups
Ibm mq dqm setups
 
Istanbul BFT
Istanbul BFTIstanbul BFT
Istanbul BFT
 
process management
 process management process management
process management
 

Semelhante a Logging Last Resource Optimization for Distributed Transactions in Oracle Weblogic Server

Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have ToHostedbyConfluent
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeDocker, Inc.
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleSri Ambati
 
Transaction Management on Cassandra
Transaction Management on CassandraTransaction Management on Cassandra
Transaction Management on CassandraScalar, Inc.
 
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen ILS
 
Tips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaTips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaAll Things Open
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streamingQuentin Ambard
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Tokyo Institute of Technology
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with PacemakerKris Buytaert
 
Real Time Operating System Concepts
Real Time Operating System ConceptsReal Time Operating System Concepts
Real Time Operating System ConceptsSanjiv Malik
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 
Adopting GraalVM - Scale by the Bay 2018
Adopting GraalVM - Scale by the Bay 2018Adopting GraalVM - Scale by the Bay 2018
Adopting GraalVM - Scale by the Bay 2018Petr Zapletal
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 

Semelhante a Logging Last Resource Optimization for Distributed Transactions in Oracle Weblogic Server (20)

Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
Transaction Management on Cassandra
Transaction Management on CassandraTransaction Management on Cassandra
Transaction Management on Cassandra
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
 
Tips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaTips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache Kafka
 
Progress_190315
Progress_190315Progress_190315
Progress_190315
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
Real Time Operating System Concepts
Real Time Operating System ConceptsReal Time Operating System Concepts
Real Time Operating System Concepts
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
Adopting GraalVM - Scale by the Bay 2018
Adopting GraalVM - Scale by the Bay 2018Adopting GraalVM - Scale by the Bay 2018
Adopting GraalVM - Scale by the Bay 2018
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Haproxy - zastosowania
Haproxy - zastosowaniaHaproxy - zastosowania
Haproxy - zastosowania
 

Mais de Gera Shegalov

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big DataGera Shegalov
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Gera Shegalov
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesGera Shegalov
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesGera Shegalov
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractGera Shegalov
 
Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsGera Shegalov
 
CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database CloudGera Shegalov
 

Mais de Gera Shegalov (11)

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big Data
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web Services
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction Contract
 
Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction Contracts
 
CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database Cloud
 

Logging Last Resource Optimization for Distributed Transactions in Oracle Weblogic Server

  • 1. Logging Last Resource Optimization for Distributed Transactions in Oracle WebLogic Server T. Barnes, A. Messinger, P. Parkinson, A. Ganesh, G. Shegalov, S. Narayan, S. Kareenhalli
  • 2. OLTP: Online Transaction Processing Transaction is an ACID contract ● Atomic – all or nothing ● Consistent – from the application perspective ● Isolated – masked concurrency through locking or snapshots ● Durable – once committed changes survive subsequent failures begin c -= 1000 Checking = 2000 s += 1000 Checking = 1000 Savings = 8000 commit Savings = 9000 time
  • 3. OLTP: Single Resource ● A and D are typically implemented using Write-Ahead Logging ● Transaction recovery is “simple”: REDO phase, UNDO phase. BEGIN TRANSACTION /* LSN = 1: log for undo and redo in MM buffer*/ UPDATE Accounts SET balance = balance – 1000 WHERE Number = 1 /* LSN = 2: log for undo and redo in MM buffer*/ UPDATE Accounts SET balance = balance + 1000 WHERE Number = 2 /* LSN = 3: log commit and force (5-6 orders slower)*/ COMMIT TRANSACTION Accounts LSN=0 Accounts LSN=1 Accounts LSN=2 1 2000 1 1000 1 1000 2 8000 2 8000 2 9000
  • 4. OLTP: Distributed / Two-Phase Commit Like a wedding ceremony Coordinator: Will you ...? (prepare) Resource: I will (OK) Coordinator: I pronounce you … (commit) Transaction Resource 1 Resource 1 Coordinator prepare --> force-log prepare force-log prepare <-- OK <-- OK commit --> force-log commit force-log commit <-- ACK <-- ACK
  • 5. 2PC is A CI D ● 2PC is not about Concurrency Control. ● 2PC transaction is therefore ○ Globally Atomic ○ Locally Isolated ○ Locally Consistent ○ Globally Durable
  • 6. OLTP: Queued Transactions client` app server database begin transaction req_q.enqueue(req1) commit transaction begin transaction creq = req_q.dequeue() resp = creq.execute() res_q.enqueue(resp) commit transaction begin transaction resp = res_q.dequeue() process(resp) commit transaction
  • 7. WebLogic Server: Java EE++ ● App containers: Web (Servlets, WS), EJB, app clients ● Services: Messaging (JMS), Transactions (JTA), Database (JDBC), …
  • 8. Example: Queued Transactions (JEE) @MessageDriven( mappedName="jms/inboundQueue“ activationConfig = {@ActivationConfigProperty( propertyName = “connectionFactoryJndiName", propertyValue = “jms/inboundConnectionFactory" )})) @TransactionAttribute(TransactionAttributeType.REQUIRED) public class OrderMDB implements javax.jms.MessageListener { @Resource javax.jms.Queue outboundQueue; @Resource javax.jms.ConnectionFactory outboundCf; @Resource javax.sql.DataSource orderDataSource; public void onMessage(Message message) { java.sql.Connection jdbc = orderDataSource.getConnection(); javax.jms.Connection jms = outboundCf.createConnection(); // update SQL via JDBC, notify via JMS connections … } }
  • 9. “School” Presumed Abort 2PC 2n+1 writes, 4n messages TM Resources prepare force-log prepared Timeline yes all-prepared: force-log commit commit force-log commit ack all-commit: log end
  • 10. PA2PC (1): TM (Coordinator)
  • 11. PA2PC (2): Resource (Participant)
  • 12. “Real Life” XA 2PC 2n+1 writes, 8n messages TM Resources xa_start ack_started xa_end ack_ended Timeline xa_prepare force-log prepared ack_prepared all-prepared: force-log commit xa_commit force-log commit ack_committed all-commit: log end
  • 13. Standard 2PC Optimizations ● 1PC: if only one resource enlisted, prepare skipped ● Read-Only: if voted read-only, commit skipped ● XA ceremony of xa_(start|end) is always present
  • 14. Nested 2PC: Coordinator Role Transfer [Gray’78] prepare p commit TC Res2 Res3 c commit commit c ● Last Resource is committed in one phase ● 2n messages/ 2n-1 forced writes ● Known topology: linked Databases
  • 15. WebLogic Design Constraints and Goals ● No control over foreign XAResource, TM and topology ● Broadband: minimize blocking RPC, not messages ● Unneeded XA on Res3: save xa_start, xa_end
  • 16. Typical WLS Deployment ● JMS and TM share the same FileStore ● Collocated JMS connection cost is negligible ● JDBC Datasource is remote: blocking RPC ● DB internal resources (locks, latches, etc.) are more expensive and JEE is not a single client ● Outbound JMS notifies about a JDBC update ● Ideally: JDBC updates visible before JMS updates
  • 17. JDBC as Logging Last Resource ● User enables a non-XA JDBC Datasource as LLR ○ LLR table WL_LLR_<server> in the DS schema ○ No XA overhead for the LLR ● TM log is local log UNION LLR table log ○ WLS does not boot if any LLR table is unavailable ● Restriction: 1 LLR datasource / Transaction ● No coordinator transfers as in Nested 2PC
  • 18. XA 2PC Commit with LL Resource 1. Prepare concurrently all non-LLR XAResources 2. Insert XID into the LLR table 3. Commit the LLR-Resource 4. If 3 is successful, commit non-LLR XAResources 5. Lazy garbage-collection of 2PC records of completed transactions is piggybacked on future LLR transactions
  • 19. LLR Failure Recovery ● Failure before LLR.commit() => global abort ● Failure during LLR.commit() => similar to media failure ○ Wait until LLR Datasource / table is available for read ○ Presence of the LLR commit log decides the global outcome ○ If unavailable for AbandonTimeoutSeconds log abandoned ● JVM/OS crash: TM scan local log UNION LLR ○ Usual transaction outcome resolution ● 2PC recovery guarantees are not compromised
  • 20. LLR Savings Back-of-the-envelope for the single-threaded case with Jeff Dean’s numbers [Google key notes]: ● xa_start (RPC), ● xa_end (RPC), ● xa_prepare (RPC + force-log) ● Insert into LLR table + commit done via single RPC ------------------------------------------------ 4xRTT + 1xDiskSeek = 4x500,000ns + 10,000,000ns = 12 milliseconds
  • 21. LLR in DS Wizard: Non-XA Driver
  • 22. LLR in DS Wizard: Safe unlike Emulate
  • 23. Research Workload EAStress2004 [SPECjAppServer’04]
  • 24. EAStress2004 Benchmark HW/Setup Driver 1 Driver 2 External Supplier (3x Dealer, (3x Dealer, Emulator 3x Manufacturing) 3x Manufacturing) 2x Quad Core 3.7Ghz 2x Quad Core 2.7Ghz 2x Quad Core 2.7Ghz x86_64, 2MB L2, 8GB x86_64, 2MB L2, 16GB x86_64, 2MB L2, 16GB RAM RAM RAM System Under Test Oracle WebLogic Server 11gR1 Oracle RDBMS 11gR1 EE (Middle Tier) (Database Tier) 2x Quad Core 2.83Ghz x86_64, 4x Quad Core 2.7Ghz x86_64, 6MB L2, 16 GB RAM 2MB L2, 64GB RAM
  • 25. Performance Evaluation (Utilization) EAStess2004 v1.08, IR = 700 (not reviewed by SPEC) MDB scenario MT = WLS on JrockIt DB = Oracle Database
  • 26. Performance Evaluation (Response Time) EAStess2004 v1.08, IR = 700 (not reviewed by SPEC) Purchase Manage Browse Manufacturing XA 4.20 2.40 5.40 3.75 LLR 1.50 1.20 1.90 3.00 improvement 2.8x 2x 2.8x 1.25x
  • 27. Future Improvements (probably in 12c) ● LLR does not detect Read-Only ● Transaction GUID instead of LLR table for Oracle
  • 28. Thank You. Questions? oracle.com/weblogic oracle.com/benchmarks
  • 29. WebLogic FileStore ● XA-capable KV store on local file system ● Mime design: allocate under write-head ○ fast writes ○ slow recovery ○ works well up to a couple of GiB ● Transactional use: for JMS messages and JTA logs ● Non-transactional use: Diagnostics and Config