SlideShare uma empresa Scribd logo
1 de 75
Baixar para ler offline
Going Beyond Provenance: Explaining Query Answers
with Pattern-based Counterbalances
SIGMOD 2019
Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy
Illinois Institute of Technology Duke University
SIGMOD Research Session 5 - July 3rd - 11:30am
Slide 1 of 16 Q. Zeng - CAPE:
Explain surprising query results
Slide 2 of 16 Q. Zeng - CAPE: Introduction
Explain surprising query results
Slide 2 of 16 Q. Zeng - CAPE: Introduction
Explain surprising query results
Slide 2 of 16 Q. Zeng - CAPE: Introduction
Related Work
Provenance
Semiring model[Green et al., 2007]
Causality based [Meliou et al., 2010]
Provenance systems[Arab et al., 2014]
Slide 3 of 16 Q. Zeng - CAPE: Introduction
Related Work
Provenance
Semiring model[Green et al., 2007]
Causality based [Meliou et al., 2010]
Provenance systems[Arab et al., 2014]
"Why high/low"
question[Wu and Madden, 2013][Roy and Suciu, 2014]
Intervention — A subset of provenance whose removal would cause
the result to move to the opposite direction
Slide 3 of 16 Q. Zeng - CAPE: Introduction
Related Work
Provenance
Semiring model[Green et al., 2007]
Causality based [Meliou et al., 2010]
Provenance systems[Arab et al., 2014]
"Why high/low"
question[Wu and Madden, 2013][Roy and Suciu, 2014]
Intervention — A subset of provenance whose removal would cause
the result to move to the opposite direction
All based on provenance
Slide 3 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
No. Non-provenance can be useful.
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
Boris: Why did you work only 2 hours yesterday?
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
Boris: Why did you work only 2 hours yesterday?
Qitian (provenance based explanation): Yeah, I worked from 9-11 AM.
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
Boris: Why did you work only 2 hours yesterday?
Qitian (provenance based explanation): Yeah, I worked from 9-11 AM.
Boris: Okay, I’m cutting low your stipend.
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Only provenance is useful?
Boris: Why did you work only 2 hours yesterday?
Qitian: I was on a plane to SIGMOD for 8 hours.
Boris: Fair enough.
Slide 4 of 16 Q. Zeng - CAPE: Introduction
Example - Table
Pub
author pubid year venue
AX P1 2005 SIGKDD
AY P2 2004 SIGKDD
AZ P2 2004 SIGKDD
AZ P3 2004 SIGMOD
Q =
SELECT author , year , venue , count (∗) AS pubcnt
FROM Pub
GROUP BY author , year , venue
Slide 5 of 16 Q. Zeng - CAPE: Introduction
Example - Table
Pub
author pubid year venue
AX P1 2005 SIGKDD
AY P2 2004 SIGKDD
AZ P2 2004 SIGKDD
AZ P3 2004 SIGMOD
Q =
SELECT author , year , venue , count (∗) AS pubcnt
FROM Pub
GROUP BY author , year , venue
author venue year pubcnt
AX SIGKDD 2006 4
AX SIGKDD 2007 1
AX SIGKDD 2008 4
Slide 5 of 16 Q. Zeng - CAPE: Introduction
Example - Query Result
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
Why high/low question
Aggregate query
Slide 6 of 16 Q. Zeng - CAPE: Introduction
Example - Query Result
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
Why high/low question
Aggregate query
Provenance-based approach
—By "intervention"
Slide 6 of 16 Q. Zeng - CAPE: Introduction
Example - Query Result
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
Why high/low question
Aggregate query
Provenance-based approach
—By "intervention"
A subset of provenance whose
removal makes
AX ’s SIGKDD 2007 paper go up
Slide 6 of 16 Q. Zeng - CAPE: Introduction
Example - Query Result
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
Why high/low question
Aggregate query
Provenance-based approach
—By "intervention"
A subset of provenance whose
removal makes
AX ’s SIGKDD 2007 paper go up
Slide 6 of 16 Q. Zeng - CAPE: Introduction
Example - Query Result
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
Why high/low question
Aggregate query
Provenance-based approach
—By "intervention"
A subset of provenance whose
removal makes
AX ’s SIGKDD 2007 paper go up
Our approach
—By counterbalance
AX ’s high publication number in
other conference or other year
Slide 6 of 16 Q. Zeng - CAPE: Introduction
Our Approach
Assumptions of φ:
A pattern exists which describes the data (Aggregate Regression
Pattern, or ARP)
(AX ,SIGKDD,2007,1) is a low outlier of the pattern
Slide 7 of 16 Q. Zeng - CAPE: Introduction
Our Approach
Assumptions of φ:
A pattern exists which describes the data (Aggregate Regression
Pattern, or ARP)
(AX ,SIGKDD,2007,1) is a low outlier of the pattern
Mine ARPs
Slide 7 of 16 Q. Zeng - CAPE: Introduction
Our Approach
Assumptions of φ:
A pattern exists which describes the data (Aggregate Regression
Pattern, or ARP)
(AX ,SIGKDD,2007,1) is a low outlier of the pattern
Mine ARPs → Look for counterbalance
Slide 7 of 16 Q. Zeng - CAPE: Introduction
Our Approach
Assumptions of φ:
A pattern exists which describes the data (Aggregate Regression
Pattern, or ARP)
(AX ,SIGKDD,2007,1) is a low outlier of the pattern
Mine ARPs → Look for counterbalance → Present top k
Slide 7 of 16 Q. Zeng - CAPE: Introduction
Our Approach
Assumptions of φ:
A pattern exists which describes the data (Aggregate Regression
Pattern, or ARP)
(AX ,SIGKDD,2007,1) is a low outlier of the pattern
Mine ARPs → Look for counterbalance → Present top k
offline Interactive with user question
Slide 7 of 16 Q. Zeng - CAPE: Introduction
Our Approach
Assumptions of φ:
A pattern exists which describes the data (Aggregate Regression
Pattern, or ARP)
(AX ,SIGKDD,2007,1) is a low outlier of the pattern
Mine ARPs → Look for counterbalance → Present top k
offline Interactive with user question
CAPE
Slide 7 of 16 Q. Zeng - CAPE: Introduction
Aggregate Regression Pattern
P="For each author , the total publication (count(*)) is linear over
the years "
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
A set of partition attributes
P="For each author , the total publication (count(*)) is linear over
the years "
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
A set of partition attributes
P="For each author , the total publication (count(*)) is linear over
the years "
A set of predictor attributes
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
A set of partition attributes
P="For each author , the total publication (count(*)) is linear over
the years "
A set of predictor attributes
An aggregate function
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
A set of partition attributes
P="For each author , the total publication (count(*)) is linear over
the years "
A set of predictor attributes
An aggregate function
A regression model type
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
P="For each author , the total publication (count(*)) is linear over
the years "
A pattern can hold locally on a fixed value of partition attributes
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
P="For each author , the total publication (count(*)) is linear over
the years "
A pattern can hold locally on a fixed value of partition attributes Say,
P holds on AX
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
P="For each author , the total publication (count(*)) is linear over
the years "
A pattern can hold locally on a fixed value of partition attributes
A pattern can also hold globally if it holds for sufficiently many values
of partition attributes (A good number of authors)
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Aggregate Regression Pattern
P="For each author , the total publication (count(*)) is linear over
the years "
A pattern can hold locally on a fixed value of partition attributes
A pattern can also hold globally if it holds for sufficiently many values
of partition attributes (A good number of authors)
Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
maximum 4 attributes in a pattern. This alone would reduce the
number of candidate patterns to polynomial.
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
Reusing sort order
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
Reusing sort order
Partition Attributes Predictor Attributes
A,B,C D
A,B C,D
A B,C,D
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
Reusing sort order
Detecting and Applying Functional Dependency
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
Reusing sort order
Detecting and Applying Functional Dependency
"For each A, agg(α) is linear over C"
A → B
⇒ "For each A and B, agg(α) is linear over C"
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Mining ARP
Brute Force: at least 3|R| candidate patterns
Optimization:
Restricting size:
Reusing sort order
Detecting and Applying Functional Dependency
Performance Evaluation on Chicago crime data, PostgreSQL
4 5 6 7 8 9 10 11
#attributes
0
1000
2000
3000
4000
5000
6000
7000
time(sec)
naive
cube
ARP-mine
(a) 10k rows
10k 50k 100k
#rows
0
500
1000
1500
2000
2500
3000
3500
time(sec)
FD opt.
w/o FD opt.
(b) 8 attributes
Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Holds locally on φ
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Holds locally on φ
E.g. P1="For each author and venue, the total publication is constant
over the years" needs to hold on (AX , SIGKDD)
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Holds locally on φ
E.g. P1="For each author and venue, the total publication is constant
over the years" needs to hold on (AX , SIGKDD)
AX ’s number of SIGKDD publications each year:
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Holds locally on φ
E.g. P1="For each author and venue, the total publication is constant
over the years" needs to hold on (AX , SIGKDD)
Generalizes φ
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Holds locally on φ
E.g. P1="For each author and venue, the total publication is constant
over the years" needs to hold on (AX , SIGKDD)
Generalizes φ
E.g. P="For each author, the total publication is linear over the years"
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
Holds locally on φ
E.g. P1="For each author and venue, the total publication is constant
over the years" needs to hold on (AX , SIGKDD)
Generalizes φ
E.g. P="For each author, the total publication is linear over the years"
AX ’s number of publications each year:
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
P="For author AX , the total publication is linear over the years"
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
P="For author AX , the total publication is linear over the years"
author AX and ICDE
constant
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
P="For author AX , the total publication is linear over the years"
author AX and ICDE
constant
P1="For author AX and ICDE, the total publication is constant over
the years"
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
P="For author AX , the total publication is linear over the years"
author AX and ICDE
constant
P1="For author AX and ICDE, the total publication is constant over
the years"
In this simple example it happens that we refined back to the same
attributes as user question but it doesn’t necessarily have to be
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
P1="For author AX and ICDE, the total publication is constant over
the years"
3 t = (AX , ICDE, 2007, 6) ∈ QP1
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Steps of Counterbalancing
φ = “Why is the number of AX ’s SIGKDD 2007 paper low”?
1 Relevant pattern (Not all patterns are useful)
2 Refinement (There might not be direct counterbalance on relevant
pattern)
P1="For author AX and ICDE, the total publication is constant over
the years"
3 t = (AX , ICDE, 2007, 6) ∈ QP1
t [pubcnt] = 6 is a high
outlier
Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Explanation
Explanations returned by CAPE for φ
contains AX ’s number of publication in other venue or other year
E.g. (AX , ICDE, 2006, 6), (AX , VLDB, 2007, 4)
don’t need to have the same schema as φ
E.g. (AX , 2010, 63)
Slide 11 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Explanation
Explanations returned by CAPE for φ
contains AX ’s number of publication in other venue or other year
E.g. (AX , ICDE, 2006, 6), (AX , VLDB, 2007, 4)
don’t need to have the same schema as φ
E.g. (AX , 2010, 63)
Not all counterbalances are good. We need to score them and return top
ones.
Slide 11 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Scoring Explanations
1 The distance between user question tuple and explanation tuple.
Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Scoring Explanations
1 The distance between user question tuple and explanation tuple.
⇒ Tuples that are more similar are more likely to cause unusual result.
For φ=(AX , SIGKDD, 2007, 1), 2007 is better than 2006 for an
answer, ICDE is better than a conference in other area like SIGCOMM
Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Scoring Explanations
1 The distance between user question tuple and explanation tuple.
2 The deviation of explanation tuple from its expected value.
Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Scoring Explanations
1 The distance between user question tuple and explanation tuple.
2 The deviation of explanation tuple from its expected value.
⇒ Higher deviation means more unusual, which is more likely to cause
other unusual events.
Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Scoring Explanations
1 The distance between user question tuple and explanation tuple.
2 The deviation of explanation tuple from its expected value.
⇒ Higher deviation means more unusual, which is more likely to cause
other unusual events.
AX ’s SIGKDD publication: AX ’s ICDE publication:
Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
Qualitative Evaluation
More example:
Chicago crime data: Crime(id, type, community, year)
Q=γtype,community,year,count(*)(Crime)
φ="Why is battery crime in 2011 at community area 26 low (16)?"
Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
Qualitative Evaluation
More example:
Chicago crime data: Crime(id, type, community, year)
Q=γtype,community,year,count(*)(Crime)
φ="Why is battery crime in 2011 at community area 26 low (16)?"
Explanation
rank type community year count(*) score
1 26 2012 117 63.9
Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
Qualitative Evaluation
More example:
Chicago crime data: Crime(id, type, community, year)
Q=γtype,community,year,count(*)(Crime)
φ="Why is battery crime in 2011 at community area 26 low (16)?"
Explanation
rank type community year count(*) score
1 26 2012 117 63.9
2 Battery 25 2011 79 60.5
Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
Qualitative Evaluation
More example:
Chicago crime data: Crime(id, type, community, year)
Q=γtype,community,year,count(*)(Crime)
φ="Why is battery crime in 2011 at community area 26 low (16)?"
Explanation
rank type community year count(*) score
1 26 2012 117 63.9
2 Battery 25 2011 79 60.5
3 Battery 2010 1095 49.0
Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
Qualitative Evaluation
More example:
Chicago crime data: Crime(id, type, community, year)
Q=γtype,community,year,count(*)(Crime)
φ="Why is battery crime in 2011 at community area 26 low (16)?"
Explanation
rank type community year count(*) score
1 26 2012 117 63.9
2 Battery 25 2011 79 60.5
3 Battery 2010 1095 49.0
4 Assault 26 2011 10 40.1
Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
Conclusion & Future Work
Conclusions
Provenance may be insufficient
Reasonable explanations can be given by counterbalance
Mine patterns offline
Look for counterbalance and rank online
Slide 14 of 16 Q. Zeng - CAPE: Conclusion & Future Work
Conclusion & Future Work
Conclusions
Provenance may be insufficient
Reasonable explanations can be given by counterbalance
Mine patterns offline
Look for counterbalance and rank online
Future Work
Extend to larger class of queries
e.g., joins
Slide 14 of 16 Q. Zeng - CAPE: Conclusion & Future Work
Questions?
?
GitHub
https://github.com/IITDBGroup/cape
Demo
VLDB 2019
Slide 15 of 16 Q. Zeng - CAPE: Conclusion & Future Work
References I
[Arab et al., 2014] Arab, B., Gawlick, D., Radhakrishnan, V., Guo, H., and Glavic, B. (2014).
A generic provenance middleware for database queries, updates, and transactions.
In Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance.
[Green et al., 2007] Green, T. J., Karvounarakis, G., and Tannen, V. (2007).
Provenance semirings.
In PODS, pages 31–40.
[Meliou et al., 2010] Meliou, A., Gatterbauer, W., Moore, K. F., and Suciu, D. (2010).
The complexity of causality and responsibility for query answers and non-answers.
PVLDB, 4(1):34–45.
[Roy and Suciu, 2014] Roy, S. and Suciu, D. (2014).
A formal approach to finding explanations for database queries.
In SIGMOD, pages 1579–1590.
[Wu and Madden, 2013] Wu, E. and Madden, S. (2013).
Scorpion: Explaining away outliers in aggregate queries.
PVLDB, 6(8):553–564.
Slide 16 of 16 Q. Zeng - CAPE: Bibliography

Mais conteúdo relacionado

Semelhante a 2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances

St Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RSt Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RAndrew Bzikadze
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Fabian Pedregosa
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Chris Fregly
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesAnne-Marie Tousch
 

Semelhante a 2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances (6)

St Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RSt Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel R
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the Trenches
 

Mais de Boris Glavic

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...Boris Glavic
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata GeneratorBoris Glavic
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...Boris Glavic
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...Boris Glavic
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSONBoris Glavic
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationBoris Glavic
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceBoris Glavic
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesBoris Glavic
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...Boris Glavic
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...Boris Glavic
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"Boris Glavic
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"Boris Glavic
 
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"Boris Glavic
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningBoris Glavic
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...Boris Glavic
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 

Mais de Boris Glavic (18)

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested Subqueries
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
 
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data Mining
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 

Último

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 

Último (20)

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 

2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances

  • 1. Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances SIGMOD 2019 Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy Illinois Institute of Technology Duke University SIGMOD Research Session 5 - July 3rd - 11:30am Slide 1 of 16 Q. Zeng - CAPE:
  • 2. Explain surprising query results Slide 2 of 16 Q. Zeng - CAPE: Introduction
  • 3. Explain surprising query results Slide 2 of 16 Q. Zeng - CAPE: Introduction
  • 4. Explain surprising query results Slide 2 of 16 Q. Zeng - CAPE: Introduction
  • 5. Related Work Provenance Semiring model[Green et al., 2007] Causality based [Meliou et al., 2010] Provenance systems[Arab et al., 2014] Slide 3 of 16 Q. Zeng - CAPE: Introduction
  • 6. Related Work Provenance Semiring model[Green et al., 2007] Causality based [Meliou et al., 2010] Provenance systems[Arab et al., 2014] "Why high/low" question[Wu and Madden, 2013][Roy and Suciu, 2014] Intervention — A subset of provenance whose removal would cause the result to move to the opposite direction Slide 3 of 16 Q. Zeng - CAPE: Introduction
  • 7. Related Work Provenance Semiring model[Green et al., 2007] Causality based [Meliou et al., 2010] Provenance systems[Arab et al., 2014] "Why high/low" question[Wu and Madden, 2013][Roy and Suciu, 2014] Intervention — A subset of provenance whose removal would cause the result to move to the opposite direction All based on provenance Slide 3 of 16 Q. Zeng - CAPE: Introduction
  • 8. Only provenance is useful? Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 9. Only provenance is useful? No. Non-provenance can be useful. Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 10. Only provenance is useful? Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 11. Only provenance is useful? Boris: Why did you work only 2 hours yesterday? Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 12. Only provenance is useful? Boris: Why did you work only 2 hours yesterday? Qitian (provenance based explanation): Yeah, I worked from 9-11 AM. Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 13. Only provenance is useful? Boris: Why did you work only 2 hours yesterday? Qitian (provenance based explanation): Yeah, I worked from 9-11 AM. Boris: Okay, I’m cutting low your stipend. Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 14. Only provenance is useful? Boris: Why did you work only 2 hours yesterday? Qitian: I was on a plane to SIGMOD for 8 hours. Boris: Fair enough. Slide 4 of 16 Q. Zeng - CAPE: Introduction
  • 15. Example - Table Pub author pubid year venue AX P1 2005 SIGKDD AY P2 2004 SIGKDD AZ P2 2004 SIGKDD AZ P3 2004 SIGMOD Q = SELECT author , year , venue , count (∗) AS pubcnt FROM Pub GROUP BY author , year , venue Slide 5 of 16 Q. Zeng - CAPE: Introduction
  • 16. Example - Table Pub author pubid year venue AX P1 2005 SIGKDD AY P2 2004 SIGKDD AZ P2 2004 SIGKDD AZ P3 2004 SIGMOD Q = SELECT author , year , venue , count (∗) AS pubcnt FROM Pub GROUP BY author , year , venue author venue year pubcnt AX SIGKDD 2006 4 AX SIGKDD 2007 1 AX SIGKDD 2008 4 Slide 5 of 16 Q. Zeng - CAPE: Introduction
  • 17. Example - Query Result φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? Why high/low question Aggregate query Slide 6 of 16 Q. Zeng - CAPE: Introduction
  • 18. Example - Query Result φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? Why high/low question Aggregate query Provenance-based approach —By "intervention" Slide 6 of 16 Q. Zeng - CAPE: Introduction
  • 19. Example - Query Result φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? Why high/low question Aggregate query Provenance-based approach —By "intervention" A subset of provenance whose removal makes AX ’s SIGKDD 2007 paper go up Slide 6 of 16 Q. Zeng - CAPE: Introduction
  • 20. Example - Query Result φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? Why high/low question Aggregate query Provenance-based approach —By "intervention" A subset of provenance whose removal makes AX ’s SIGKDD 2007 paper go up Slide 6 of 16 Q. Zeng - CAPE: Introduction
  • 21. Example - Query Result φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? Why high/low question Aggregate query Provenance-based approach —By "intervention" A subset of provenance whose removal makes AX ’s SIGKDD 2007 paper go up Our approach —By counterbalance AX ’s high publication number in other conference or other year Slide 6 of 16 Q. Zeng - CAPE: Introduction
  • 22. Our Approach Assumptions of φ: A pattern exists which describes the data (Aggregate Regression Pattern, or ARP) (AX ,SIGKDD,2007,1) is a low outlier of the pattern Slide 7 of 16 Q. Zeng - CAPE: Introduction
  • 23. Our Approach Assumptions of φ: A pattern exists which describes the data (Aggregate Regression Pattern, or ARP) (AX ,SIGKDD,2007,1) is a low outlier of the pattern Mine ARPs Slide 7 of 16 Q. Zeng - CAPE: Introduction
  • 24. Our Approach Assumptions of φ: A pattern exists which describes the data (Aggregate Regression Pattern, or ARP) (AX ,SIGKDD,2007,1) is a low outlier of the pattern Mine ARPs → Look for counterbalance Slide 7 of 16 Q. Zeng - CAPE: Introduction
  • 25. Our Approach Assumptions of φ: A pattern exists which describes the data (Aggregate Regression Pattern, or ARP) (AX ,SIGKDD,2007,1) is a low outlier of the pattern Mine ARPs → Look for counterbalance → Present top k Slide 7 of 16 Q. Zeng - CAPE: Introduction
  • 26. Our Approach Assumptions of φ: A pattern exists which describes the data (Aggregate Regression Pattern, or ARP) (AX ,SIGKDD,2007,1) is a low outlier of the pattern Mine ARPs → Look for counterbalance → Present top k offline Interactive with user question Slide 7 of 16 Q. Zeng - CAPE: Introduction
  • 27. Our Approach Assumptions of φ: A pattern exists which describes the data (Aggregate Regression Pattern, or ARP) (AX ,SIGKDD,2007,1) is a low outlier of the pattern Mine ARPs → Look for counterbalance → Present top k offline Interactive with user question CAPE Slide 7 of 16 Q. Zeng - CAPE: Introduction
  • 28. Aggregate Regression Pattern P="For each author , the total publication (count(*)) is linear over the years " Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 29. Aggregate Regression Pattern A set of partition attributes P="For each author , the total publication (count(*)) is linear over the years " Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 30. Aggregate Regression Pattern A set of partition attributes P="For each author , the total publication (count(*)) is linear over the years " A set of predictor attributes Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 31. Aggregate Regression Pattern A set of partition attributes P="For each author , the total publication (count(*)) is linear over the years " A set of predictor attributes An aggregate function Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 32. Aggregate Regression Pattern A set of partition attributes P="For each author , the total publication (count(*)) is linear over the years " A set of predictor attributes An aggregate function A regression model type Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 33. Aggregate Regression Pattern P="For each author , the total publication (count(*)) is linear over the years " A pattern can hold locally on a fixed value of partition attributes Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 34. Aggregate Regression Pattern P="For each author , the total publication (count(*)) is linear over the years " A pattern can hold locally on a fixed value of partition attributes Say, P holds on AX Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 35. Aggregate Regression Pattern P="For each author , the total publication (count(*)) is linear over the years " A pattern can hold locally on a fixed value of partition attributes A pattern can also hold globally if it holds for sufficiently many values of partition attributes (A good number of authors) Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 36. Aggregate Regression Pattern P="For each author , the total publication (count(*)) is linear over the years " A pattern can hold locally on a fixed value of partition attributes A pattern can also hold globally if it holds for sufficiently many values of partition attributes (A good number of authors) Slide 8 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 37. Mining ARP Brute Force: at least 3|R| candidate patterns Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 38. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 39. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: maximum 4 attributes in a pattern. This alone would reduce the number of candidate patterns to polynomial. Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 40. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: Reusing sort order Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 41. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: Reusing sort order Partition Attributes Predictor Attributes A,B,C D A,B C,D A B,C,D Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 42. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: Reusing sort order Detecting and Applying Functional Dependency Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 43. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: Reusing sort order Detecting and Applying Functional Dependency "For each A, agg(α) is linear over C" A → B ⇒ "For each A and B, agg(α) is linear over C" Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 44. Mining ARP Brute Force: at least 3|R| candidate patterns Optimization: Restricting size: Reusing sort order Detecting and Applying Functional Dependency Performance Evaluation on Chicago crime data, PostgreSQL 4 5 6 7 8 9 10 11 #attributes 0 1000 2000 3000 4000 5000 6000 7000 time(sec) naive cube ARP-mine (a) 10k rows 10k 50k 100k #rows 0 500 1000 1500 2000 2500 3000 3500 time(sec) FD opt. w/o FD opt. (b) 8 attributes Slide 9 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 45. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 46. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 47. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Holds locally on φ Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 48. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Holds locally on φ E.g. P1="For each author and venue, the total publication is constant over the years" needs to hold on (AX , SIGKDD) Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 49. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Holds locally on φ E.g. P1="For each author and venue, the total publication is constant over the years" needs to hold on (AX , SIGKDD) AX ’s number of SIGKDD publications each year: Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 50. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Holds locally on φ E.g. P1="For each author and venue, the total publication is constant over the years" needs to hold on (AX , SIGKDD) Generalizes φ Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 51. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Holds locally on φ E.g. P1="For each author and venue, the total publication is constant over the years" needs to hold on (AX , SIGKDD) Generalizes φ E.g. P="For each author, the total publication is linear over the years" Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 52. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) Holds locally on φ E.g. P1="For each author and venue, the total publication is constant over the years" needs to hold on (AX , SIGKDD) Generalizes φ E.g. P="For each author, the total publication is linear over the years" AX ’s number of publications each year: Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 53. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 54. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) P="For author AX , the total publication is linear over the years" Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 55. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) P="For author AX , the total publication is linear over the years" author AX and ICDE constant Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 56. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) P="For author AX , the total publication is linear over the years" author AX and ICDE constant P1="For author AX and ICDE, the total publication is constant over the years" Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 57. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) P="For author AX , the total publication is linear over the years" author AX and ICDE constant P1="For author AX and ICDE, the total publication is constant over the years" In this simple example it happens that we refined back to the same attributes as user question but it doesn’t necessarily have to be Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 58. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) P1="For author AX and ICDE, the total publication is constant over the years" 3 t = (AX , ICDE, 2007, 6) ∈ QP1 Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 59. Steps of Counterbalancing φ = “Why is the number of AX ’s SIGKDD 2007 paper low”? 1 Relevant pattern (Not all patterns are useful) 2 Refinement (There might not be direct counterbalance on relevant pattern) P1="For author AX and ICDE, the total publication is constant over the years" 3 t = (AX , ICDE, 2007, 6) ∈ QP1 t [pubcnt] = 6 is a high outlier Slide 10 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 60. Explanation Explanations returned by CAPE for φ contains AX ’s number of publication in other venue or other year E.g. (AX , ICDE, 2006, 6), (AX , VLDB, 2007, 4) don’t need to have the same schema as φ E.g. (AX , 2010, 63) Slide 11 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 61. Explanation Explanations returned by CAPE for φ contains AX ’s number of publication in other venue or other year E.g. (AX , ICDE, 2006, 6), (AX , VLDB, 2007, 4) don’t need to have the same schema as φ E.g. (AX , 2010, 63) Not all counterbalances are good. We need to score them and return top ones. Slide 11 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 62. Scoring Explanations 1 The distance between user question tuple and explanation tuple. Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 63. Scoring Explanations 1 The distance between user question tuple and explanation tuple. ⇒ Tuples that are more similar are more likely to cause unusual result. For φ=(AX , SIGKDD, 2007, 1), 2007 is better than 2006 for an answer, ICDE is better than a conference in other area like SIGCOMM Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 64. Scoring Explanations 1 The distance between user question tuple and explanation tuple. 2 The deviation of explanation tuple from its expected value. Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 65. Scoring Explanations 1 The distance between user question tuple and explanation tuple. 2 The deviation of explanation tuple from its expected value. ⇒ Higher deviation means more unusual, which is more likely to cause other unusual events. Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 66. Scoring Explanations 1 The distance between user question tuple and explanation tuple. 2 The deviation of explanation tuple from its expected value. ⇒ Higher deviation means more unusual, which is more likely to cause other unusual events. AX ’s SIGKDD publication: AX ’s ICDE publication: Slide 12 of 16 Q. Zeng - CAPE: Counterbalance with ARP
  • 67. Qualitative Evaluation More example: Chicago crime data: Crime(id, type, community, year) Q=γtype,community,year,count(*)(Crime) φ="Why is battery crime in 2011 at community area 26 low (16)?" Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
  • 68. Qualitative Evaluation More example: Chicago crime data: Crime(id, type, community, year) Q=γtype,community,year,count(*)(Crime) φ="Why is battery crime in 2011 at community area 26 low (16)?" Explanation rank type community year count(*) score 1 26 2012 117 63.9 Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
  • 69. Qualitative Evaluation More example: Chicago crime data: Crime(id, type, community, year) Q=γtype,community,year,count(*)(Crime) φ="Why is battery crime in 2011 at community area 26 low (16)?" Explanation rank type community year count(*) score 1 26 2012 117 63.9 2 Battery 25 2011 79 60.5 Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
  • 70. Qualitative Evaluation More example: Chicago crime data: Crime(id, type, community, year) Q=γtype,community,year,count(*)(Crime) φ="Why is battery crime in 2011 at community area 26 low (16)?" Explanation rank type community year count(*) score 1 26 2012 117 63.9 2 Battery 25 2011 79 60.5 3 Battery 2010 1095 49.0 Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
  • 71. Qualitative Evaluation More example: Chicago crime data: Crime(id, type, community, year) Q=γtype,community,year,count(*)(Crime) φ="Why is battery crime in 2011 at community area 26 low (16)?" Explanation rank type community year count(*) score 1 26 2012 117 63.9 2 Battery 25 2011 79 60.5 3 Battery 2010 1095 49.0 4 Assault 26 2011 10 40.1 Slide 13 of 16 Q. Zeng - CAPE: Qualitative Evaluation
  • 72. Conclusion & Future Work Conclusions Provenance may be insufficient Reasonable explanations can be given by counterbalance Mine patterns offline Look for counterbalance and rank online Slide 14 of 16 Q. Zeng - CAPE: Conclusion & Future Work
  • 73. Conclusion & Future Work Conclusions Provenance may be insufficient Reasonable explanations can be given by counterbalance Mine patterns offline Look for counterbalance and rank online Future Work Extend to larger class of queries e.g., joins Slide 14 of 16 Q. Zeng - CAPE: Conclusion & Future Work
  • 75. References I [Arab et al., 2014] Arab, B., Gawlick, D., Radhakrishnan, V., Guo, H., and Glavic, B. (2014). A generic provenance middleware for database queries, updates, and transactions. In Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance. [Green et al., 2007] Green, T. J., Karvounarakis, G., and Tannen, V. (2007). Provenance semirings. In PODS, pages 31–40. [Meliou et al., 2010] Meliou, A., Gatterbauer, W., Moore, K. F., and Suciu, D. (2010). The complexity of causality and responsibility for query answers and non-answers. PVLDB, 4(1):34–45. [Roy and Suciu, 2014] Roy, S. and Suciu, D. (2014). A formal approach to finding explanations for database queries. In SIGMOD, pages 1579–1590. [Wu and Madden, 2013] Wu, E. and Madden, S. (2013). Scorpion: Explaining away outliers in aggregate queries. PVLDB, 6(8):553–564. Slide 16 of 16 Q. Zeng - CAPE: Bibliography