SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
@CaryMillsap
Cary Millsap talks about
Performance
Cary Millsap
Cintra Software and Services · Method R Corporation
@CaryMillsap
DOUG Oracle Database Forum, Richardson, Texas
5:30p–7:30p Tuesday 29 January 2019
© 2011, 2019 Cary Millsap
1
2@CaryMillsap
2020
2015
2010
2005
2000
1995
1990
1985
100 45
6
3
hotsos
Method R
TM
Optimal Flexible Architecture
Oracle APS
System Pe ormance Group
Method R Profiler
Method R Tools
Method R Trace
The definitive guide to accurate, high-precision
measurement of user performance experiences,
for Oracle application developers and database
administrators.
Cary V. Millsap
TM
MeTHOD R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Second Edition
REVISED
UPDA
TED
NEW PA
G
ES
1 3 2
Method R Workbench
TRASIR
SimDiff
Cary Millsap
@CaryMillsap
Plan
Two tools
An intermittency problem
A predicting problem
3
@CaryMillsap
Two tools
4
@CaryMillsap
There are only two possible root causes for
any response time problem.
What are they?
5
@CaryMillsap
There are only two possible root causes for
any response time problem:
Call count is too big.
Latency is too big.*
* Probably because someone else’s call counts are too big.
6
@CaryMillsap
Proof
Any duration can be expressed as a
sum of its spanning, non-overlapping
component durations, in a form called
a profile.
The only way that d can be too large is
for some ci or ri to be too large.
7
Component
Call
count
Mean
latency Duration
1 c1 r1 d1 = c1 r1
2 c2 r2 d2 = c2 r2
… … … …
n cn rn dn = cn rn
Total d =
n
∑
i = 1
ci ri
@CaryMillsap
Example: di = ci ri
CALL-NAME DURATION CALLS MEAN
----------------------------- ---------- ----- --------
PARSE 735.426197 698 1.053619
SQL*Net message from client 104.762229 1,378 0.076025
FETCH 91.800028 680 0.135000
db file sequential read 0.104670 14 0.007476
EXEC 0.083988 349 0.000241
gc cr block 2-way 0.073233 96 0.000763
gc current block 2-way 0.031298 47 0.000666
gc current grant busy 0.028037 47 0.000597
SQL*Net more data from client 0.025819 837 0.000031
CLOSE 0.018999 698 0.000027
12 others 0.061576 1,633 0.000038
----------------------------- ---------- ----- --------
TOTAL (22) 932.416074 6,477 0.143958
8
d =
n
∑
i = 1
ci ri =
d1 = c1 = r1 =
@CaryMillsap
di = ci ri is a line, right?
Meaning, for
example, if the call
count doubles, then
duration doubles,
too, right?
Wrong.
9
0
1
2
3
4
5
Call count (ci)
Duration(di)
@CaryMillsap
di = ci ri is a hyperbola
Each ri is a
complicated
function of call count
(ci) and other inputs.
10
0
1
2
3
4
5
Call count (ci)
Duration(di)
@CaryMillsap
“Complicated function of c…”
11
ρ =
λ
mμ
λ = f(c)
E[r] =
1
μ
(mρ)m
m(1 − ρ)2m!(
(mρ)m
(1 − ρ)m!
+
emρΓ(m, mρ)
Γ(m) )
+ 1
@CaryMillsap
What does this mean?
Adding load can
cause your
durations to
skyrocket.
Subtracting load
can cause your
durations to
plummet.
12
0
1
2
3
4
5
Call count (ci)
Duration(di)
@CaryMillsap
So, your two tools…
➊	There are only two possible root causes

for any response time problem
Call count
Latency
➋ Call count is more important
13
@CaryMillsap
Intermittency
14
@CaryMillsap
The problem
We have this batch job. It processes pretty much the same amount of
data every time we run it. It usually runs in a little over an hour, but
sometimes—out of the blue—it’ll run nearly two and a half hours. We
have no idea when it’s going to happen. There must be a pattern to it;
we just can’t figure out what it is. It was slow last Tuesday, but it’s not
slow every Tuesday. It’s slow sometimes between three and four
o’clock, but not always, and sometimes it’s slow at other times. We
thought maybe it was interference with our daily batch jobs, but we’ve
proven that that’s not it, either. We just can’t correlate it to anything…
15
@CaryMillsap
Reproducible test case
Shortens your feedback loop
Proves the value of each remedy
16
@CaryMillsap
How to reproduce?
But how to reproduce a “patternless” problem?
One way: trace it every time it runs
17
@CaryMillsap
You want two trace files
Caught-in-the-act
Baseline
18
@CaryMillsap
Why a baseline?
Learning how to trace it at all may not be trivial
Figure that out first
“Catch it in the act” is a separate problem
You may be able to find your problem, even with just the
baseline
19
@CaryMillsap
Baseline
20
@CaryMillsap
The 1,023,971 read calls
21
@CaryMillsap
What if?
22
@CaryMillsap
That could have been it!
If read call latencies degrade to 5 ms, then the job will run 2× longer.
Interesting discovery.
Does it prove what happened on Tuesday?
23
@CaryMillsap
Call to action
What is your real goal?
Understand last Tuesday?
Job never again runs twice as long?
Do you need a caught-in-the-act trace file to do anything productive?
This job is sensitive to read latencies. That is important.
What’s your plan?
24
@CaryMillsap
Discussion
1. How would you learn whether it really is spikes in disk I/O demand that are
actually causing the intermittent job slowdowns?
2. What data in the trace file would you study to determine whether the job
itself is executing more read calls than it should?
3. Imagine that the trace of the job running quickly had been collected on a
non-production system. How would this have limited your analysis?
4. Are all applications with high call counts susceptible to latency sensitivity
like in this example? Why or why not?
5. What kind of changes to your application would make it easier to catch the
job in the act of being slow?
25
@CaryMillsap
Predicting
26
@CaryMillsap
The problem
A program inserts 5,000 rows into a table. The program makes a
mistake that I’ve seen lots of developers make. A script fired off
two simultaneous executions of this program on a single-core
Windows laptop. Each program ran for about 12 s.
27
@CaryMillsap
Profile by subroutine
28
@CaryMillsap
Profile by SQL statement
insert into parse2 values (1, lpad('1',20))
insert into parse2 values (2, lpad('2',20))
insert into parse2 values (3, lpad('3',20))
…
insert into parse2 values (5000, lpad('5000',20))
29
@CaryMillsap
The code
# Baseline: BAD
for each value in some set of 5,000 values {
sql = sprintf("insert into parse2 values (%s, lpad('%s',20))", value, value);
cursor = parse(sql);
result = exec(cursor) or die;
}
commit;
30
@CaryMillsap
The code
# Baseline: BAD
for each value in some set of 5,000 values {
sql = sprintf("insert into parse2 values (%s, lpad('%s',20))", value, value);
cursor = parse(sql);
result = exec(cursor) or die;
}
commit;
# Improved: BETTER
cursor = parse("insert into parse2 values (?, lpad(?,20))";
for each value in some set of 5,000 values {
result = exec(cursor, value, value) or die;
}
commit;
31
@CaryMillsap
Profile by subroutine

for just the top statement
32
@CaryMillsap
What will change?
33
@CaryMillsap
The usual prediction
34
@CaryMillsap
What really happened
35
36@CaryMillsap
@CaryMillsap
Debrief
UAFBC and UAFWC call counts, almost dead-on
SNMFC call count, we missed badly (database connector bug)
PARSE call count, nailed it
Huge miss: duration per call changed
37
@CaryMillsap
Why did 

duration per call
change?
38
@CaryMillsap
Remember these?
Duration per call is not a constant.
It varies with load, which varies with call count.
39
@CaryMillsap
Discussion
1. Why does duration per call depend on load? Why does load depend on call
counts?
2. Why did the duration per call drop so much?
3. What would it have taken to more accurately predict the duration per call
changes resulting from our improvement to the code?
4. Why would anyone write code like the baseline code in this story?
5. What other kinds of performance problems can you envision the baseline
code having, under conditions of higher concurrency?
6. Are we finished optimizing this program with just the changes I’ve suggested
in the slides? What else could we do to make this program run faster?
40
@CaryMillsap
Wrap-up
41
@CaryMillsap
For more information…
42
The definitive guide to accurate, high-precision
measurement of user performance experiences,
for Oracle application developers and database
administrators.
Cary V. Millsap
TM
MeTHOD R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Second Edition
REVISED
UPDA
TED
NEW PA
G
ES
1 3 2
Available at
The definitive guide to accurate, high-precision
measurement of user performance experiences,
for Oracle application developers and database
administrators.
Cary V. Millsap
Method R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Third Edition
MASTERINGORACLETRACEDATAThirdEditionCaryV.MillsapMethodR
TM
Method R Corporation
http://method-r.com
info@method-r.com
@MethodR
The Guide to
MASTERING ORACLE TRACE DATA
Third Edition
• Explains how to create Oracle trace files and understand the stories
they tell you about your applications and the time they consume
• Prescribes a reliable method for optimizing Oracle-based applications
throughout all the phases of your software development life cycle
• Demonstrates through worked examples how Oracle application
developers and database administrators can use Oracle trace data and
Method R software to solve and prevent performance problems
• Contains more than 50 pages of new material, including new worked
examples, new information about production-safe tracing, state-of-the-art
information about measuring connection pooling applications, and more
A rare and treasured skill is the ability to look a system’s user or its owner
square in the eye and talk bluntly about performance: How long are the users’
experiences with the system taking? Why? What would be the effect on each
user’s response times if you were to make this change or that change? What
else could you do? Are your programs running as fast as they can?
This book teaches you how to do that.
The first time I used Method R and their software, I reduced the run time
of one query from 61/2 hours to less than 11 minutes. In the years since, I’ve
used it to save a system three years’ worth of work by reducing a query’s
runtime from 2.0 to 0.2 seconds, to demonstrate that anti-virus software
was causing unnecessary delays in an application whose vendor had blamed
the Oracle Database, and to save an international airline over a quarter
of a million dollars on an upgrade that was destined to disappoint them.
Method R is the simplest and most effective way to achieve such results.
—Guðmundur Jósepsson
CEO, Miracle Iceland
COMING
SOON!
Available at
@CaryMillsap 43
www.cintra.com
@CaryMillsap
One-day training course
Tuesday 26 March 2019
Galleria Dallas
contact gfeinberg@cintra.com
44
The definitive guide to accurate, high-precision
measurement of user performance experiences,
for Oracle application developers and database
administrators.
Cary V. Millsap
Method R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Third Edition
MASTERINGORACLETRACEDATAThirdEditionCaryV.MillsapMethodR
TM

Mais conteúdo relacionado

Semelhante a “Performance” - Dallas Oracle Users Group 2019-01-29 presentation

Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
Ted Dunning
 

Semelhante a “Performance” - Dallas Oracle Users Group 2019-01-29 presentation (20)

RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and Histograms
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
 
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
NoSQL
NoSQLNoSQL
NoSQL
 
The Value of Metadata
The Value of MetadataThe Value of Metadata
The Value of Metadata
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 
Lessons from an AWS outage and how to detect root cause of cloud service disr...
Lessons from an AWS outage and how to detect root cause of cloud service disr...Lessons from an AWS outage and how to detect root cause of cloud service disr...
Lessons from an AWS outage and how to detect root cause of cloud service disr...
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
From Mess To Masterpiece - JFokus 2017
From Mess To Masterpiece - JFokus 2017From Mess To Masterpiece - JFokus 2017
From Mess To Masterpiece - JFokus 2017
 
How to find and fix your Oracle-based application performance problem
How to find and fix your Oracle-based application performance problemHow to find and fix your Oracle-based application performance problem
How to find and fix your Oracle-based application performance problem
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
ML Meetup #27 - Data Infrasctructure and Data Access in Nubank
ML Meetup #27 - Data Infrasctructure and Data Access in NubankML Meetup #27 - Data Infrasctructure and Data Access in Nubank
ML Meetup #27 - Data Infrasctructure and Data Access in Nubank
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 

Mais de Cary Millsap

Mais de Cary Millsap (7)

Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and Monitoring
 
Performance
PerformancePerformance
Performance
 
Oracle trace data collection errors: the story about oceans, islands, and rivers
Oracle trace data collection errors: the story about oceans, islands, and riversOracle trace data collection errors: the story about oceans, islands, and rivers
Oracle trace data collection errors: the story about oceans, islands, and rivers
 
Most important "trick" of performance instrumentation
Most important "trick" of performance instrumentationMost important "trick" of performance instrumentation
Most important "trick" of performance instrumentation
 
My Case for Agile
My Case for AgileMy Case for Agile
My Case for Agile
 
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
 
Diagnosability versus The Cloud, Toronto 2011-04-21
Diagnosability versus The Cloud, Toronto 2011-04-21Diagnosability versus The Cloud, Toronto 2011-04-21
Diagnosability versus The Cloud, Toronto 2011-04-21
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

“Performance” - Dallas Oracle Users Group 2019-01-29 presentation

  • 1. @CaryMillsap Cary Millsap talks about Performance Cary Millsap Cintra Software and Services · Method R Corporation @CaryMillsap DOUG Oracle Database Forum, Richardson, Texas 5:30p–7:30p Tuesday 29 January 2019 © 2011, 2019 Cary Millsap 1
  • 2. 2@CaryMillsap 2020 2015 2010 2005 2000 1995 1990 1985 100 45 6 3 hotsos Method R TM Optimal Flexible Architecture Oracle APS System Pe ormance Group Method R Profiler Method R Tools Method R Trace The definitive guide to accurate, high-precision measurement of user performance experiences, for Oracle application developers and database administrators. Cary V. Millsap TM MeTHOD R TM The Guide to MASTERING ORACLE TRACE DATA Second Edition REVISED UPDA TED NEW PA G ES 1 3 2 Method R Workbench TRASIR SimDiff Cary Millsap
  • 3. @CaryMillsap Plan Two tools An intermittency problem A predicting problem 3
  • 5. @CaryMillsap There are only two possible root causes for any response time problem. What are they? 5
  • 6. @CaryMillsap There are only two possible root causes for any response time problem: Call count is too big. Latency is too big.* * Probably because someone else’s call counts are too big. 6
  • 7. @CaryMillsap Proof Any duration can be expressed as a sum of its spanning, non-overlapping component durations, in a form called a profile. The only way that d can be too large is for some ci or ri to be too large. 7 Component Call count Mean latency Duration 1 c1 r1 d1 = c1 r1 2 c2 r2 d2 = c2 r2 … … … … n cn rn dn = cn rn Total d = n ∑ i = 1 ci ri
  • 8. @CaryMillsap Example: di = ci ri CALL-NAME DURATION CALLS MEAN ----------------------------- ---------- ----- -------- PARSE 735.426197 698 1.053619 SQL*Net message from client 104.762229 1,378 0.076025 FETCH 91.800028 680 0.135000 db file sequential read 0.104670 14 0.007476 EXEC 0.083988 349 0.000241 gc cr block 2-way 0.073233 96 0.000763 gc current block 2-way 0.031298 47 0.000666 gc current grant busy 0.028037 47 0.000597 SQL*Net more data from client 0.025819 837 0.000031 CLOSE 0.018999 698 0.000027 12 others 0.061576 1,633 0.000038 ----------------------------- ---------- ----- -------- TOTAL (22) 932.416074 6,477 0.143958 8 d = n ∑ i = 1 ci ri = d1 = c1 = r1 =
  • 9. @CaryMillsap di = ci ri is a line, right? Meaning, for example, if the call count doubles, then duration doubles, too, right? Wrong. 9 0 1 2 3 4 5 Call count (ci) Duration(di)
  • 10. @CaryMillsap di = ci ri is a hyperbola Each ri is a complicated function of call count (ci) and other inputs. 10 0 1 2 3 4 5 Call count (ci) Duration(di)
  • 11. @CaryMillsap “Complicated function of c…” 11 ρ = λ mμ λ = f(c) E[r] = 1 μ (mρ)m m(1 − ρ)2m!( (mρ)m (1 − ρ)m! + emρΓ(m, mρ) Γ(m) ) + 1
  • 12. @CaryMillsap What does this mean? Adding load can cause your durations to skyrocket. Subtracting load can cause your durations to plummet. 12 0 1 2 3 4 5 Call count (ci) Duration(di)
  • 13. @CaryMillsap So, your two tools… ➊ There are only two possible root causes
 for any response time problem Call count Latency ➋ Call count is more important 13
  • 15. @CaryMillsap The problem We have this batch job. It processes pretty much the same amount of data every time we run it. It usually runs in a little over an hour, but sometimes—out of the blue—it’ll run nearly two and a half hours. We have no idea when it’s going to happen. There must be a pattern to it; we just can’t figure out what it is. It was slow last Tuesday, but it’s not slow every Tuesday. It’s slow sometimes between three and four o’clock, but not always, and sometimes it’s slow at other times. We thought maybe it was interference with our daily batch jobs, but we’ve proven that that’s not it, either. We just can’t correlate it to anything… 15
  • 16. @CaryMillsap Reproducible test case Shortens your feedback loop Proves the value of each remedy 16
  • 17. @CaryMillsap How to reproduce? But how to reproduce a “patternless” problem? One way: trace it every time it runs 17
  • 18. @CaryMillsap You want two trace files Caught-in-the-act Baseline 18
  • 19. @CaryMillsap Why a baseline? Learning how to trace it at all may not be trivial Figure that out first “Catch it in the act” is a separate problem You may be able to find your problem, even with just the baseline 19
  • 23. @CaryMillsap That could have been it! If read call latencies degrade to 5 ms, then the job will run 2× longer. Interesting discovery. Does it prove what happened on Tuesday? 23
  • 24. @CaryMillsap Call to action What is your real goal? Understand last Tuesday? Job never again runs twice as long? Do you need a caught-in-the-act trace file to do anything productive? This job is sensitive to read latencies. That is important. What’s your plan? 24
  • 25. @CaryMillsap Discussion 1. How would you learn whether it really is spikes in disk I/O demand that are actually causing the intermittent job slowdowns? 2. What data in the trace file would you study to determine whether the job itself is executing more read calls than it should? 3. Imagine that the trace of the job running quickly had been collected on a non-production system. How would this have limited your analysis? 4. Are all applications with high call counts susceptible to latency sensitivity like in this example? Why or why not? 5. What kind of changes to your application would make it easier to catch the job in the act of being slow? 25
  • 27. @CaryMillsap The problem A program inserts 5,000 rows into a table. The program makes a mistake that I’ve seen lots of developers make. A script fired off two simultaneous executions of this program on a single-core Windows laptop. Each program ran for about 12 s. 27
  • 29. @CaryMillsap Profile by SQL statement insert into parse2 values (1, lpad('1',20)) insert into parse2 values (2, lpad('2',20)) insert into parse2 values (3, lpad('3',20)) … insert into parse2 values (5000, lpad('5000',20)) 29
  • 30. @CaryMillsap The code # Baseline: BAD for each value in some set of 5,000 values { sql = sprintf("insert into parse2 values (%s, lpad('%s',20))", value, value); cursor = parse(sql); result = exec(cursor) or die; } commit; 30
  • 31. @CaryMillsap The code # Baseline: BAD for each value in some set of 5,000 values { sql = sprintf("insert into parse2 values (%s, lpad('%s',20))", value, value); cursor = parse(sql); result = exec(cursor) or die; } commit; # Improved: BETTER cursor = parse("insert into parse2 values (?, lpad(?,20))"; for each value in some set of 5,000 values { result = exec(cursor, value, value) or die; } commit; 31
  • 32. @CaryMillsap Profile by subroutine
 for just the top statement 32
  • 37. @CaryMillsap Debrief UAFBC and UAFWC call counts, almost dead-on SNMFC call count, we missed badly (database connector bug) PARSE call count, nailed it Huge miss: duration per call changed 37
  • 38. @CaryMillsap Why did 
 duration per call change? 38
  • 39. @CaryMillsap Remember these? Duration per call is not a constant. It varies with load, which varies with call count. 39
  • 40. @CaryMillsap Discussion 1. Why does duration per call depend on load? Why does load depend on call counts? 2. Why did the duration per call drop so much? 3. What would it have taken to more accurately predict the duration per call changes resulting from our improvement to the code? 4. Why would anyone write code like the baseline code in this story? 5. What other kinds of performance problems can you envision the baseline code having, under conditions of higher concurrency? 6. Are we finished optimizing this program with just the changes I’ve suggested in the slides? What else could we do to make this program run faster? 40
  • 42. @CaryMillsap For more information… 42 The definitive guide to accurate, high-precision measurement of user performance experiences, for Oracle application developers and database administrators. Cary V. Millsap TM MeTHOD R TM The Guide to MASTERING ORACLE TRACE DATA Second Edition REVISED UPDA TED NEW PA G ES 1 3 2 Available at The definitive guide to accurate, high-precision measurement of user performance experiences, for Oracle application developers and database administrators. Cary V. Millsap Method R TM The Guide to MASTERING ORACLE TRACE DATA Third Edition MASTERINGORACLETRACEDATAThirdEditionCaryV.MillsapMethodR TM Method R Corporation http://method-r.com info@method-r.com @MethodR The Guide to MASTERING ORACLE TRACE DATA Third Edition • Explains how to create Oracle trace files and understand the stories they tell you about your applications and the time they consume • Prescribes a reliable method for optimizing Oracle-based applications throughout all the phases of your software development life cycle • Demonstrates through worked examples how Oracle application developers and database administrators can use Oracle trace data and Method R software to solve and prevent performance problems • Contains more than 50 pages of new material, including new worked examples, new information about production-safe tracing, state-of-the-art information about measuring connection pooling applications, and more A rare and treasured skill is the ability to look a system’s user or its owner square in the eye and talk bluntly about performance: How long are the users’ experiences with the system taking? Why? What would be the effect on each user’s response times if you were to make this change or that change? What else could you do? Are your programs running as fast as they can? This book teaches you how to do that. The first time I used Method R and their software, I reduced the run time of one query from 61/2 hours to less than 11 minutes. In the years since, I’ve used it to save a system three years’ worth of work by reducing a query’s runtime from 2.0 to 0.2 seconds, to demonstrate that anti-virus software was causing unnecessary delays in an application whose vendor had blamed the Oracle Database, and to save an international airline over a quarter of a million dollars on an upgrade that was destined to disappoint them. Method R is the simplest and most effective way to achieve such results. —Guðmundur Jósepsson CEO, Miracle Iceland COMING SOON! Available at
  • 44. @CaryMillsap One-day training course Tuesday 26 March 2019 Galleria Dallas contact gfeinberg@cintra.com 44 The definitive guide to accurate, high-precision measurement of user performance experiences, for Oracle application developers and database administrators. Cary V. Millsap Method R TM The Guide to MASTERING ORACLE TRACE DATA Third Edition MASTERINGORACLETRACEDATAThirdEditionCaryV.MillsapMethodR TM