“Performance” - Dallas Oracle Users Group 2019-01-29 presentation

@CaryMillsap
Cary Millsap talks about
Performance
Cary Millsap
Cintra Software and Services · Method R Corporation
@CaryMillsap
DOUG Oracle Database Forum, Richardson, Texas
5:30p–7:30p Tuesday 29 January 2019
© 2011, 2019 Cary Millsap
1

2@CaryMillsap
2020
2015
2010
2005
2000
1995
1990
1985
100 45
6
3
hotsos
Method R
TM
Optimal Flexible Architecture
Oracle APS
System Pe ormance Group
Method R Profiler
Method R Tools
Method R Trace
The deﬁnitive guide to accurate, high-precision
measurement of user performance experiences,
for Oracle application developers and database
administrators.
Cary V. Millsap
TM
MeTHOD R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Second Edition
REVISED
UPDA
TED
NEW PA
G
ES
1 3 2
Method R Workbench
TRASIR
SimDiﬀ
Cary Millsap

@CaryMillsap
Plan
Two tools
An intermittency problem
A predicting problem
3

@CaryMillsap
There are only two possible root causes for
any response time problem.
What are they?
5

@CaryMillsap
There are only two possible root causes for
any response time problem:
Call count is too big.
Latency is too big.*
* Probably because someone else’s call counts are too big.
6

@CaryMillsap
Proof
Any duration can be expressed as a
sum of its spanning, non-overlapping
component durations, in a form called
a profile.
The only way that d can be too large is
for some ci or ri to be too large.
7
Component
Call
count
Mean
latency Duration
1 c1 r1 d1 = c1 r1
2 c2 r2 d2 = c2 r2
… … … …
n cn rn dn = cn rn
Total d =
n
∑
i = 1
ci ri

@CaryMillsap
Example: di = ci ri
CALL-NAME DURATION CALLS MEAN
----------------------------- ---------- ----- --------
PARSE 735.426197 698 1.053619
SQL*Net message from client 104.762229 1,378 0.076025
FETCH 91.800028 680 0.135000
db file sequential read 0.104670 14 0.007476
EXEC 0.083988 349 0.000241
gc cr block 2-way 0.073233 96 0.000763
gc current block 2-way 0.031298 47 0.000666
gc current grant busy 0.028037 47 0.000597
SQL*Net more data from client 0.025819 837 0.000031
CLOSE 0.018999 698 0.000027
12 others 0.061576 1,633 0.000038
----------------------------- ---------- ----- --------
TOTAL (22) 932.416074 6,477 0.143958
8
d =
n
∑
i = 1
ci ri =
d1 = c1 = r1 =

@CaryMillsap
di = ci ri is a line, right?
Meaning, for
example, if the call
count doubles, then
duration doubles,
too, right?
Wrong.
9
0
1
2
3
4
5
Call count (ci)
Duration(di)

@CaryMillsap
di = ci ri is a hyperbola
Each ri is a
complicated
function of call count
(ci) and other inputs.
10
0
1
2
3
4
5
Call count (ci)
Duration(di)

@CaryMillsap
“Complicated function of c…”
11
ρ =
λ
mμ
λ = f(c)
E[r] =
1
μ
(mρ)m
m(1 − ρ)2m!(
(mρ)m
(1 − ρ)m!
+
emρΓ(m, mρ)
Γ(m) )
+ 1

@CaryMillsap
What does this mean?
Adding load can
cause your
durations to
skyrocket.
Subtracting load
can cause your
durations to
plummet.
12
0
1
2
3
4
5
Call count (ci)
Duration(di)

@CaryMillsap
So, your two tools…
➊ There are only two possible root causes 
for any response time problem
Call count
Latency
➋ Call count is more important
13

@CaryMillsap
The problem
We have this batch job. It processes pretty much the same amount of
data every time we run it. It usually runs in a little over an hour, but
sometimes—out of the blue—it’ll run nearly two and a half hours. We
have no idea when it’s going to happen. There must be a pattern to it;
we just can’t figure out what it is. It was slow last Tuesday, but it’s not
slow every Tuesday. It’s slow sometimes between three and four
o’clock, but not always, and sometimes it’s slow at other times. We
thought maybe it was interference with our daily batch jobs, but we’ve
proven that that’s not it, either. We just can’t correlate it to anything…
15

@CaryMillsap
Reproducible test case
Shortens your feedback loop
Proves the value of each remedy
16

@CaryMillsap
How to reproduce?
But how to reproduce a “patternless” problem?
One way: trace it every time it runs
17

@CaryMillsap
You want two trace files
Caught-in-the-act
Baseline
18

@CaryMillsap
Why a baseline?
Learning how to trace it at all may not be trivial
Figure that out first
“Catch it in the act” is a separate problem
You may be able to find your problem, even with just the
baseline
19

@CaryMillsap
The 1,023,971 read calls
21

@CaryMillsap
That could have been it!
If read call latencies degrade to 5 ms, then the job will run 2× longer.
Interesting discovery.
Does it prove what happened on Tuesday?
23

@CaryMillsap
Call to action
What is your real goal?
Understand last Tuesday?
Job never again runs twice as long?
Do you need a caught-in-the-act trace file to do anything productive?
This job is sensitive to read latencies. That is important.
What’s your plan?
24

@CaryMillsap
Discussion
1. How would you learn whether it really is spikes in disk I/O demand that are
actually causing the intermittent job slowdowns?
2. What data in the trace file would you study to determine whether the job
itself is executing more read calls than it should?
3. Imagine that the trace of the job running quickly had been collected on a
non-production system. How would this have limited your analysis?
4. Are all applications with high call counts susceptible to latency sensitivity
like in this example? Why or why not?
5. What kind of changes to your application would make it easier to catch the
job in the act of being slow?
25

@CaryMillsap
The problem
A program inserts 5,000 rows into a table. The program makes a
mistake that I’ve seen lots of developers make. A script fired oﬀ
two simultaneous executions of this program on a single-core
Windows laptop. Each program ran for about 12 s.
27

@CaryMillsap
Profile by subroutine
28

@CaryMillsap
Profile by SQL statement
insert into parse2 values (1, lpad('1',20))
…
29

@CaryMillsap
The code
# Baseline: BAD
for each value in some set of 5,000 values {
sql = sprintf("insert into parse2 values (%s, lpad('%s',20))", value, value);
cursor = parse(sql);
result = exec(cursor) or die;
}
commit;
30

@CaryMillsap
The code
# Baseline: BAD
sql = sprintf("insert into parse2 values (%s, lpad('%s',20))", value, value);
cursor = parse(sql);
result = exec(cursor) or die;
}
commit;
# Improved: BETTER
cursor = parse("insert into parse2 values (?, lpad(?,20))";
result = exec(cursor, value, value) or die;
}
commit;
31

@CaryMillsap
Profile by subroutine 
for just the top statement
32

@CaryMillsap
What will change?
33

@CaryMillsap
The usual prediction
34

@CaryMillsap
What really happened
35

@CaryMillsap
Debrief
UAFBC and UAFWC call counts, almost dead-on
SNMFC call count, we missed badly (database connector bug)
PARSE call count, nailed it
Huge miss: duration per call changed
37

@CaryMillsap
Why did  
duration per call
change?
38

@CaryMillsap
Remember these?
Duration per call is not a constant.
It varies with load, which varies with call count.
39

@CaryMillsap
Discussion
1. Why does duration per call depend on load? Why does load depend on call
counts?
2. Why did the duration per call drop so much?
3. What would it have taken to more accurately predict the duration per call
changes resulting from our improvement to the code?
4. Why would anyone write code like the baseline code in this story?
5. What other kinds of performance problems can you envision the baseline
code having, under conditions of higher concurrency?
6. Are we finished optimizing this program with just the changes I’ve suggested
in the slides? What else could we do to make this program run faster?
40

@CaryMillsap
For more information…
42
The deﬁnitive guide to accurate, high-precision
administrators.
Cary V. Millsap
TM
MeTHOD R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Second Edition
REVISED
UPDA
TED
NEW PA
G
ES
1 3 2
Available at
The definitive guide to accurate, high-precision
administrators.
Cary V. Millsap
Method R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Third Edition
MASTERINGORACLETRACEDATAThirdEditionCaryV.MillsapMethodR
TM
Method R Corporation
http://method-r.com
info@method-r.com
@MethodR
The Guide to
MASTERING ORACLE TRACE DATA
Third Edition
• Explains how to create Oracle trace files and understand the stories
they tell you about your applications and the time they consume
• Prescribes a reliable method for optimizing Oracle-based applications
throughout all the phases of your software development life cycle
• Demonstrates through worked examples how Oracle application
developers and database administrators can use Oracle trace data and
Method R software to solve and prevent performance problems
• Contains more than 50 pages of new material, including new worked
examples, new information about production-safe tracing, state-of-the-art
information about measuring connection pooling applications, and more
A rare and treasured skill is the ability to look a system’s user or its owner
square in the eye and talk bluntly about performance: How long are the users’
experiences with the system taking? Why? What would be the effect on each
user’s response times if you were to make this change or that change? What
else could you do? Are your programs running as fast as they can?
This book teaches you how to do that.
The first time I used Method R and their software, I reduced the run time
of one query from 61/2 hours to less than 11 minutes. In the years since, I’ve
used it to save a system three years’ worth of work by reducing a query’s
runtime from 2.0 to 0.2 seconds, to demonstrate that anti-virus software
was causing unnecessary delays in an application whose vendor had blamed
the Oracle Database, and to save an international airline over a quarter
of a million dollars on an upgrade that was destined to disappoint them.
Method R is the simplest and most effective way to achieve such results.
—Guðmundur Jósepsson
CEO, Miracle Iceland
COMING
SOON!
Available at

@CaryMillsap 43
www.cintra.com

@CaryMillsap
One-day training course
Tuesday 26 March 2019
Galleria Dallas
contact gfeinberg@cintra.com
44
The definitive guide to accurate, high-precision
administrators.
Cary V. Millsap
Method R
TM
The Guide to
MASTERING
ORACLE
TRACE DATA
Third Edition
MASTERINGORACLETRACEDATAThirdEditionCaryV.MillsapMethodR
TM

“Performance” - Dallas Oracle Users Group 2019-01-29 presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a “Performance” - Dallas Oracle Users Group 2019-01-29 presentation

Semelhante a “Performance” - Dallas Oracle Users Group 2019-01-29 presentation (20)

Mais de Cary Millsap

Mais de Cary Millsap (7)

Último

Último (20)

“Performance” - Dallas Oracle Users Group 2019-01-29 presentation