Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Dependable Cardinality Forecast for XQuery
1. Dependable Cardinality Forecasts for XQuery
Jens Teubner, ETH (formerly IBM Research)
Torsten Grust, U T¨bingen (formerly TUM)
u
Sebastian Maneth, UNSW and NICTA
Sherif Sakr, UNSW and NICTA
c Systems Group — Department of Computer Science — ETH Z¨rich
u August 26, 2008
2. Cardinality Estimation for XQuery
The feature-richness and semantics of the language make
cardinality estimation for XQuery notoriously hard.
for $d in doc ("forecast.xml")/descendant::day
let $day := $d/@t
let $ppcp := data ($d/descendant::ppcp)
return
if ($ppcp > 50)
then ("rain likely on", $day,
"chance of precipitation:", $ppcp)
else ("no rain on", $day)
for iteration sequence construction
conditionals (if-then-else) ...
existential quantification (XPath is not a focus of this work.)
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 2
3. Cardinality Estimation for XQuery
The feature-richness and semantics of the language make
cardinality estimation for XQuery notoriously hard.
for $d in doc ("forecast.xml")/descendant::day card1 = ?
let $day := $d/@t
let $ppcp := data ($d/descendant::ppcp)
return
if ($ppcp > 50) card2 = ?
then ("rain likely on", $day, card3 = ?
"chance of precipitation:", $ppcp)
else ("no rain on", $day) card4 = ?
for iteration sequence construction
conditionals (if-then-else) ...
existential quantification (XPath is not a focus of this work.)
→ Goal: Compute subexpression-level cardinalities cardi .
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 2
4. Idea: Perform cardinality estimation on relational plan
equivalents for XQuery.
relational cardinality estimation (System R)
+ existing work on XPath estimation
+ histograms for value predicates
= cardinality estimation for XQuery
Build on Pathfinder’s XQuery-to-relational algebra compiler.1
→ tuple count ≡ XQuery item count
Cardinality information for each subexpression.
1
http://www.pathfinder-xquery.org/
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 3
5. πiter:outer,pos:pos1,item
pos1: ord,pos outer
Example (Plan details not of interest today.) 2 iter=inner
·
∪
4
1 3 πiter,pos:pos1,item
for $d in doc ("forecast.xml")/descendant::day πiter,pos:pos1,item pos1: ord,pos iter
let $day := $d/@t pos1: ord,pos iter ·
∪
πiter,pos:1,ord:1,
let $ppcp := data ($d/descendant::ppcp) ·
∪ item:"rain..."
·
∪
return 2 πiter,pos:1,ord:1,
πiter,pos,item,ord:2 ∪·
if ($ppcp > 50) 3 item:"no rain..." πiter,pos:1,ord:3,
item:"chance..."
then ("rain likely on", $day, πiter,pos,item,ord:2 πiter,pos,item,ord:4
"chance of precipitation:", $ppcp) iter1=iter iter1=iter iter=iter1
else ("no rain on", $day) 4
δ
πiter
σres
res:(item,item1)
Pathfinder compiles XQuery iter1=iter πiter1:iter,
πiter1:iter,pos,item
with arbitrary nesting. πiter1:iter,
pos1:1,item1:50 item
pos,item
pos: item iter
Maintain correspondence to πiter pos: item iter
item:attribute::t(item) item:descendant::ppcp(item)
original query if back-end is πiter:inner,item πinner,outer:iter,
ord:pos
not relational. inner: iter,pos
1 pos: item iter
Derive estimates based on an item:descendant::day(item)
inference rule set. docitem
πiter,item:"forecast.xml"
iter
1
6. Relational XQuery Cardinality Estimation
Apply System R-style estimation to relational XQuery plans, e.g.,
Disjoint union: Cartesian product:
·
|q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 |
Equi-join:
|q1 | · |q2 | if there are indexes on
max {|a| , |b| } both join columns,
idx idx
|q1 q2 | = |q1 | · |q2 | if there is only an index
a=b
|a|idx on column a,
|q1 | · |q2 | · 1/10 otherwise
|c|idx : Number of unique values in index on column c.
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 5
7. Relational XQuery Cardinality Estimation
Apply System R-style estimation to relational XQuery plans, e.g.,
Disjoint union: Cartesian product:
·
|q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 |
Equi-join:
|q1 | · |q2 | if there are indexes on
max {|a| , |b| }
idx idx
both join columns, ?
|q1 q2 | = |q1 | · |q2 | if there is only an index
a=b
|a|idx on column a, ?
|q1 | · |q2 | · 1/10 otherwise
Our joins typically operate over computed relations.
|c|idx : Number of unique values in index on column c.
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 5
8. Abstract Domain Identifiers
A simple form of data flow analysis provides the information
needed.
Introduce abstract domain identifiers α, β, . . . as placeholders
for the active runtime domain for each column c.
(Read c α as “column c contains values from domain α.”)
Estimate the size α of each domain α, e.g.,2
dom a: b1 ,...,bn (q) ⊇ dom (q) ∪ aα ∧ α =! |q| .
Identify inclusion relationships α β between domains, e.g.,
aα ∈ dom (q) ∧ aβ ∈ dom (σ··· (q)) ⇒ β α .
2
Operator a: b1 ,...,bn introduces a new key column (holding row numbers).
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 6
9. Abstract Domain Identifiers
Use abstract domain information for cardinality estimation.
E.g., “foreign key” join:
aα ∈ dom (q1 ) bβ ∈ dom (q2 ) α β
|q1 | · |q2 |
|q1 q2 | =
a=b β
Domain inclusion guarantees that each tuple in q1 finds
(at least one) join partner in q2 .
Other examples:
|q1 q2 | = |q1 | − |q2 | if q2 is a subset of q1 .
|q1 q2 | = 0 if q1 is a subset of q2 .
|q1 q2 | = |q1 | if q1 and q2 are disjoint.
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 7
10. Interfacing with XPath—Projection Paths
Track XPath navigation by means of projection paths3
a a b
a b c
c:child::*(b) 1 γa γb
1 γa
b c d 2 γb
1 γa γc
1 γa γd
3 γd
e 3 γd γe
q1
q2
b⇒p ∈ path (q1 ) ⇒ c⇒p/child::* ∈ path (q2 )
Step operator makes XPath navigation explicit in relational
plans (compiles to join on SQL back-ends).
3
A. Marian and J. Sim´ on. Projecting XML Documents. VLDB 2003.
e
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 8
11. Interfacing with XPath—Cardinality Inference
bα
bα
a a b
a b c
c:child::*(b) 1 γa γb
1 γa
b c d 2 γb
1 γa γc α α
1 γa γd
3 γd
e 3 γd γe
q1
q2
Cardinality:
fn:count (p/child::*)
|q2 | = |q1 | · fn:count (p) = |q1 | · Prchild::* (p)
fanout (here: 4/3)
Domain Sizes:
fn:count (p [ child::* ])
α = α · fn:count (p) = α · Pr[child::*] (p)
selectivity (here: 2/3)
Any XPath estimator that provides Prp2 (p1 ) and Pr[p2 ] (p1 ) will do.
Our prototype uses a simple Data Guide-based implementation.
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 9
12. πiter:outer,pos:pos1,item
pos1: ord,pos outer
Back to our Example Plan 2 iter=inner
·
∪
4
πiter,pos:pos1,item
3
πiter,pos:pos1,item pos1: ord,pos iter
res:(item,item1)
pos1: ord,pos iter ·
∪
πiter,pos:1,ord:1,
·
∪ ·
∪
iter1=iter item:"rain..."
πiter,pos,item,ord:2 ∪·
πiter,pos:1,ord:1,
πiter1:iter, item:"no rain..." πiter,pos:1,ord:3,
item:"chance..."
pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4
iter1=iter iter1=iter iter=iter1
πiter pos: item iter
δ
item:descendant::ppcp(item) πiter
σres
πiter:inner,item
res:(item,item1)
iter1=iter πiter1:iter,
πiter1:iter,pos,item
pos,item
πiter1:iter,
pos1:1,item1:50 item
pos: item iter
πiter pos: item iter
item:attribute::t(item) item:descendant::ppcp(item)
πiter:inner,item πinner,outer:iter,
ord:pos
inner: iter,pos
1 pos: item iter
item:descendant::day(item)
docitem
a: Retrieve typed values for node identifiers πiter,item:"forecast.xml"
in column a (atomization). iter
1
13. πiter:outer,pos:pos1,item
pos1: ord,pos outer
Back to our Example Plan 2 iter=inner
·
∪
4
πiter,pos:pos1,item
3
πiter,pos:pos1,item pos1: ord,pos iter
res:(item,item1)
pos1: ord,pos iter ·
∪
πiter,pos:1,ord:1,
·
∪ ·
∪
iter1=iter item:"rain..."
πiter,pos,item,ord:2 ∪·
πiter,pos:1,ord:1,
πiter1:iter, item:"no rain..." πiter,pos:1,ord:3,
item:"chance..."
pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4
iter1=iter iter1=iter iter=iter1
πiter pos: item iter
Projection path: δ
item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path
πiter ... (q)
σres
πiter:inner,item
Cardinality (uses fanout):
res:(item,item1)
item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
πiter1:iter,pos,item
iter1=iter πiter1:iter,
pos,item
πiter1:iter,
pos1:1,item1:50 item
pos: item iter
πiter pos: item iter
item:attribute::t(item) item:descendant::ppcp(item)
πiter:inner,item πinner,outer:iter,
ord:pos
inner: iter,pos
1 pos: item iter
item:descendant::day(item)
docitem
a: Retrieve typed values for node identifiers πiter,item:"forecast.xml"
in column a (atomization). iter
1
14. πiter:outer,pos:pos1,item
pos1: ord,pos outer
Back to our Example Plan 2 iter=inner
·
∪
4
πiter,pos:pos1,item
3
πiter,pos:pos1,item pos1: ord,pos iter
res:(item,item1)
pos1: ord,pos iter ·
∪
πiter,pos:1,ord:1,
·
∪ ·
∪
iter1=iter item:"rain..."
πiter,pos,item,ord:2 ∪·
πiter,pos:1,ord:1,
πiter1:iter, item:"no rain..." πiter,pos:1,ord:3,
item:"chance..."
pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4
iter1=iter iter1=iter iter=iter1
πiter pos: item iter
iterα Projection path: δ
item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path
πiter ... (q)
σres
πiter:inner,item iter α
Cardinality (uses fanout):
res:(item,item1)
item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
πiter1:iter,pos,item
iter1=iter πiter1:iter,
pos,item
πiter1:iter,
pos1:1,item1:50 item
Domain iter πiter
pos: item
inclusion: pos: item iter
iterα ∈ dom
item:attribute::t(item) ... (q) ; α α
item:descendant::ppcp(item)
πiter:inner,item πinner,outer:iter,
ord:pos
Domain size (uses step selectivity):
inner: iter,pos
α = α 1· Pr[ax::nt] (· · · )
pos: item iter
item:descendant::day(item)
docitem
a: Retrieve typed values for node identifiers πiter,item:"forecast.xml"
in column a (atomization). iter
1
15. πiter:outer,pos:pos1,item
pos1: ord,pos outer
Back to our Example Plan 2 iter=inner
·
∪
4
πiter,pos:pos1,item
3
πiter,pos:pos1,item pos1: ord,pos iter
res:(item,item1)
pos1: ord,pos iter ·
∪
π
iter1α iter1=iter “Foreign key” iter,pos:1,ord:1,
∪· join:
item:"rain..." ∪·
iterα q1 q2 rain..." α
item:"no =
πiter,pos,item,ord:2
πiter,pos:1,ord:1, |q1 |·|q2 |
(since α
∪·
α)
πiter1:iter, πiter,pos:1,ord:3,
iter1=iter item:"chance..."
pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4
iter1=iter iter1=iter iter=iter1
πiter pos: item iter
iterα Projection path: δ
item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path
πiter ... (q)
σres
πiter:inner,item iter α
Cardinality (uses fanout):
res:(item,item1)
item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
πiter1:iter,pos,item
iter1=iter πiter1:iter,
pos,item
πiter1:iter,
pos1:1,item1:50 item
Domain iter πiter
pos: item
inclusion: pos: item iter
iterα ∈ dom
item:attribute::t(item) ... (q) ; α α
item:descendant::ppcp(item)
πiter:inner,item πinner,outer:iter,
ord:pos
Domain size (uses step selectivity):
inner: iter,pos
α = α 1· Pr[ax::nt] (· · · )
pos: item iter
item:descendant::day(item)
docitem
a: Retrieve typed values for node identifiers πiter,item:"forecast.xml"
in column a (atomization). iter
1
17. Forecasting New Zealand’s Weather
The obtained cardinalities can be mapped back to predict item
counts for corresponding XQuery expressions:4
for $d in doc ("forecast.xml")/descendant::day card1 = 990/990
let $day := $d/@t
let $ppcp := data ($d/descendant::ppcp)
return
if ($ppcp > 50) card2 = 4104/3402
then ("rain likely on", $day, card3 = 3540/2370
"chance of precipitation:", $ppcp)
else ("no rain on", $day) card4 = 564/1032
Value statistics based on histograms ( paper)
Inaccuracy is mainly due to correlations in the data.
Rain in the morning likely means rain in the afternoon, too.
4
estimated/observed, based on data taken for New Zealand two weeks ago.
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 11
18. More Realistic Queries: W3C XQuery Use Cases
0.01 0.1 1 10
Prototype implementation XMP Q4
based on Pathfinder XMP Q5
XMP Q6
For each subexpression: XMP Q8
XMP Q9
estimated cardinality XMP Q10
XMP Q11
observed cardinality SGML Q3
SGML Q4
Diameter indicates data SGML Q8a
point “stacking” SGML Q8b
R Q2
Plan root: filled circle R Q3
R Q10
R Q11
Applicable to “real” queries R Q13
R Q14
Recovery from intermediate R Q15
mis-estimations R Q17
e.g., existential semantics 0.01 0.1 1 10
estimated cardinality/observed cardinality
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 12
19. Wrap-Up
Cardinality estimation framework for XQuery
Subexpression-level estimates for arbitrary XQuery expressions
Based on Pathfinder’s XQuery-to-relational algebra compiler
relational cardinality estimation (System R)
+ existing work on XPath estimation
+ histograms for value predicates
= cardinality estimation for XQuery
High-quality estimates for realistic XQuery workloads
Robust with respect to intermediate errors
Pluggable and extensible
e.g., XPath estimation subsystem, positional predicates
August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich
u 13