SlideShare uma empresa Scribd logo
1 de 128
Baixar para ler offline
Analysis of Branch Misses in Quicksort
Sebastian Wild
wild@cs.uni-kl.de
based on joint work with Conrado Martínez and Markus E. Nebel
04 January 2015
Meeting on Analytic Algorithmics and Combinatorics
Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
These probabilities hold for all elements U,
independent of all other elements!
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
< P ?
swap < Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15

Mais conteúdo relacionado

Semelhante a Analysis of branch misses in Quicksort

Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelinesturki_09
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionDilum Bandara
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln ViewRavi Soni
 
6- Threaded Interpretation.docx
6- Threaded Interpretation.docx6- Threaded Interpretation.docx
6- Threaded Interpretation.docxshruti533256
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdfMadhuGupta99385
 
Workshop gl prt exercises-introduction
Workshop gl prt exercises-introductionWorkshop gl prt exercises-introduction
Workshop gl prt exercises-introductionhome
 
Reducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAReducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAnehagaur339
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxIrfan Anjum
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of PipeliningSHAKOOR AB
 

Semelhante a Analysis of branch misses in Quicksort (20)

Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
 
6- Threaded Interpretation.docx
6- Threaded Interpretation.docx6- Threaded Interpretation.docx
6- Threaded Interpretation.docx
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
 
Core pipelining
Core pipelining Core pipelining
Core pipelining
 
Pruning your code
Pruning your codePruning your code
Pruning your code
 
PLSQL Advanced
PLSQL AdvancedPLSQL Advanced
PLSQL Advanced
 
Workshop gl prt exercises-introduction
Workshop gl prt exercises-introductionWorkshop gl prt exercises-introduction
Workshop gl prt exercises-introduction
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
 
Reducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAReducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGA
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 
W10: Interrupts
W10: InterruptsW10: Interrupts
W10: Interrupts
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of Pipelining
 

Mais de Sebastian Wild

Succint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSuccint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSebastian Wild
 
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceEntropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceSebastian Wild
 
Sesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSebastian Wild
 
Average cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingAverage cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingSebastian Wild
 
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Sebastian Wild
 
Median-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysMedian-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysSebastian Wild
 
Quicksort and Binary Search Trees
Quicksort and Binary Search TreesQuicksort and Binary Search Trees
Quicksort and Binary Search TreesSebastian Wild
 

Mais de Sebastian Wild (7)

Succint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSuccint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum Problems
 
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceEntropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
 
Sesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selection
 
Average cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingAverage cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot sampling
 
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
 
Median-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysMedian-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keys
 
Quicksort and Binary Search Trees
Quicksort and Binary Search TreesQuicksort and Binary Search Trees
Quicksort and Binary Search Trees
 

Último

DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 

Último (20)

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 

Analysis of branch misses in Quicksort

  • 1. Analysis of Branch Misses in Quicksort Sebastian Wild wild@cs.uni-kl.de based on joint work with Conrado Martínez and Markus E. Nebel 04 January 2015 Meeting on Analytic Algorithmics and Combinatorics Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
  • 2. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 3. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 4. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 5. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 6. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 7. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 8. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 9. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 10. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 11. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 12. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 13. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 14. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 15. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 16. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 17. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 18. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 19. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 20. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 21. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 22. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 23. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 24. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 25. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 26. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 27. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 28. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 29. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 30. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 31. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 32. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 33. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 34. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 35. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 36. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 37. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 38. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 39. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 40. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 41. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 42. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 43. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 44. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 These probabilities hold for all elements U, independent of all other elements! Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 45. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 46. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 47. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 48. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 49. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 50. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition branch taken with prob. P i. i. d. for all elements U! memoryless source     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 51. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition branch taken with prob. P i. i. d. for all elements U! memoryless source     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 52. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 53. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 54. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 55. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 56. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 57. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 58. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 59. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 60. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 61. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 62. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 63. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 64. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 65. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 66. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 67. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 68. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 69. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 70. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 71. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 72. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 73. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 74. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 75. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 76. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 77. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 78. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 79. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 80. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 81. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 82. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 83. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 84. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 85. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 86. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 87. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 88. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 89. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 90. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 91. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 92. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 93. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 94. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 95. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 96. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 97. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 98. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 99. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp = 2 (t1 + 1)(t2 + 1) (k + 2)(k + 1) no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 100. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp = 2 (t1 + 1)(t2 + 1) (k + 2)(k + 1) no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 101. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 102. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 103. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 104. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 105. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 106. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 107. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input < P ? swap < Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 108. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 109. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 110. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 111. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 112. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 113. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 114. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 115. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 116. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 117. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 118. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 119. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 120. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 121. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 122. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 123. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 124. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 125. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 126. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 127. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 128. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15