SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
On Answering Why-Not Queries Against Scientific
Workflow Provenance
Khalid Belhajjame
PSL Research University, Paris-Dauphine University, LAMSADE, Paris, 75016, France
khalid.belhajjame@dauphine.fr
July 13, 2018
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 1 / 26
Context: Scientific Workflows
Scientific workflows have been
shown to facilitate and accelerate
scientific data exploration and
analysis in many areas of sciences,
including proteomics, metabolics,
astronomy, and bio-medicine.
The figure on the right side
illustrates an example of a simple
workflow used for identifying the
pathways associated with a given
input metabolite (compound).
Given a compound identifier, the
first module returns a compound
name, which is used to feed the
second module to obtain the
corresponding pathway.
Workflow input ports
Workflow output ports
compound_id
get_compound_info
output_pathways
extract_pathway_from_compounds_file
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 2 / 26
Aim: Evaluating Why-Not Queries Against Workflow
Executions
Why-not queries help scientists understand why a given data item,
e.g., their favorite biological pathway, was not returned by the
workflow executions.
While answering such queries has been thoroughly investigated for
relational databases, only a few proposals examined their evaluation
in the context of scientific workflows.
Objective: To elaborate a solution for evaluating why not queries
against workflows with black-box modules.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 3 / 26
Related Work: Database (Querying) Land
Instance-based attempts to find the data items in the inputs that are
responsible for the non appearance of a given data item in the result.
Consider the example below (taken from Huang et al. VLDB 2008).
The query returns the schools in the state of California are within the top 4
and have job openings.
The answer returned by the query is Stanford and its rank in the result.
Why-not query: Why does Berkley not appear in he results?
What change shall I make to the source to obtain (Homer, 25) in the results?
if a potential tuple (berkeley, ca, yes) is inserted into the openings table,
Berkeley will become an answer
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 4 / 26
Related Work: Database (Querying) Land
Module-based attempts to identify the modules (sub-queries) that
are responsible for the non-appearance of a given data item in the
workflow results.
In the case of the previous example, we have only one join, which is
responsible in this case for the non appearance of Berkley in the result
set of the query.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 5 / 26
Related Work: Workflow Land
The only proposal in this category for workflow provenance is the
Why-Not algorithm proposed by Chapman and Jagadish 2009.
Using the Why-Not algorithm proposed by Chapman and Jagadish,
the user query is expressed as a set of atomic predicates that are
combined using AND and OR.
Chapman and Jagadish make the assumption that the attributes of
the input datasets are preserved by the modules that compose the
workflow.
This is not the case, however, in the general case.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 6 / 26
Related Work: Workflow Land
For example, the modules in the workflow
illustrated on the right do not preserve the
attribute of the input, viz. Compound − ID,
in that the output of the first and the
second module do not contain information
about the compound identifier.
In the work presented in this talk, we drop
the assumption made by Chapman and
Jagadish, and propose a solution that can
be utilized for answering why-not queries
for workflow with modules that do not
preserve attributes of the input datasets.
Furthermore, unlike the Why-Not
algorithm which is module-based, our
proposal is hybrid in that it seeks to
answer instance- and module-based
why-not queries.
Workflow input ports
Workflow output ports
compound_id
get_compound_info
output_pathways
extract_pathway_from_compounds_file
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 7 / 26
Foundations
Why-not query: A user specifies a why-not query by providing a
data item dwhy−not that has the same data type as the output of the
last module of the workflow and was not returned by the workflow
executions.
Module pickyness: Central to the evaluation of why-not queries is
the pickyness of its modules. A module M in a workflow is picky with
respect to a data item d if its inverse Minv does not accept d as
input. More specifically, Minv throws an illegal input exception when
its execution is fed d.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 8 / 26
Processing Why-Not Queries
The algorithm for processing why-not queries, takes as input a data item
dwhy−not specified by the user
To answer a why-not query, the modules of the workflow are explored from
the sink to the source in a breadth-first fashion. To do so, we group the
workflow modules into levels as illustrated in the figure below.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 9 / 26
Processing Why-Not Queries
The modules of each level are examined to identify if the module is picky.
Specifically, the inverse of the module in question M is examined to check
if:
1 It does not accept the corresponding data items that were generated
by the inverse of the modules in the previous level.
2 It accepts the corresponding data items that were generated by the
inverse of the modules in the previous modules.
In this case, the data items the inverse of M produces are saved to be
used to feed the inverse of the modules in the succeeding levels, if any.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 10 / 26
Identifying Picky Modules
To identify if a module M is picky, we need to invoke its inverse Minv ,
and check if it accepts the data items in question.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 11 / 26
Identifying Picky Modules
To identify if a module M is picky, we need to invoke its inverse Minv ,
and check if it accepts the data items in question.
However, the inverse module rarely exists.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 12 / 26
Identifying Picky Modules
To identify if a module M is picky, we need to invoke its inverse Minv ,
and check if it accepts the data items in question.
However, the inverse module rarely exists.
To overcome the non-existence of the inverse module, we can probe
the modules until we have the output we are after, or else fail and
deduce that the module in question is picky.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 13 / 26
Identifying Picky Modules
To identify if a module M is picky, we need to invoke its inverse Minv ,
and check if it accepts the data items in question.
However, the inverse module rarely exists.
To overcome the non-existence of the inverse module, we can probe
the modules until we have the output we are after, or else fail and
deduce that the module in question is picky.
This is not a reasonable solution because the space of valid input
values of a module can be very large or even infinite. The problem is
exacerbated by the fact that a module may have multiple inputs,
therefore requiring the construction of all possible combination for
probing.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 14 / 26
Identifying Picky Modules
To identify if a module M is picky, we need to invoke its inverse Minv ,
and check if it accepts the data items in question.
However, the inverse module rarely exists.
To overcome the non-existence of the inverse module, we can probe
the modules until we have the output we are after, or else fail and
deduce that the module in question is picky.
This is not a reasonable solution because the space of valid input
values of a module can be very large or even infinite. The problem is
exacerbated by the fact that a module may have multiple inputs,
therefore requiring the construction of all possible combination for
probing.
Is there a more reasonable solution... that at least allows us to probe
the modules using fewer inputs?
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 15 / 26
Identifying Picky Modules by Harvesting the Web
A solution that we explored consist in harvesting the (probably)
biggest source of information, namely the Web using the information
extraction process illustrated below.
Indeed, an important number of scientific modules that are provided
by major institutions, such as the EBI and DDBJ, provides also for
users the means to invoke these modules on the web, and the traces
of those module invocation remains in a number of cases accessible
on the Web.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 16 / 26
Identifying Picky Modules by Harvesting the Web
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 17 / 26
Identifying Picky Modules by Harvesting the Web
If none of the candidate inputs is
found to be true positive, then we
conclude that the module is likely to
be picky.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 18 / 26
Feasibility Study
The approach we have just described raises the following question. Is
the algorithm proposed able to identify the reason why a given data
item does not appear in the work!ow results? More specifically, How
effective is this solution in identifying picky modules and missing
input data items?
To answer the above questions, we run a feasibility experiment, in
which we used a sample of 6 real-world workflows from the
myExperiment repository.
We selected workflows that involve deterministic modules, which mean
modules that deliver the same result (if any) given the same input.
We did not consider workflows that include modules performing data
mining operations, for instance.
We have also selected workflows for which the inverse modules are also
deterministic functions.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 19 / 26
Feasibility Study
We have executed each workflow using example data inputs provided
by the workflow authors.
We then specified two kinds of queries for each work!ow:
Instance-based why-not query. To assess the ability of the algorithm in
answering this type of queries, we randomly selected an output data
item d that was returned by the workflow executions. Next, we used
our algorithm to see if it is able to reconstruct the lineage of d by
harvesting the web to identify the input data items that were
responsible for its derivation.
Module-based why-not query This kind of query is used to assess if the
algorithm is able to identify picky modules
In total we had 6 queries of the first kind, which we denote by
{q+
1 , . . . , q+
6 }, and 6 queries of the second kind, which we denote by
{q−
1 , . . . , q−
6 }.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 20 / 26
Feasibility Study: Results
Of the queries {q+
1 , . . . , q+
6 }, our algorithm was able to successfully
constructs the provenance of the why-not query up to the workflow
input for 3 queries.
Most of the modules composing these workflows, namely 8 out of 11,
provides information about the input and output datasets on the Web
using Tabular formats.
After examination of the three remaining workflows, we found that
one them utilizes proprietary data sources, the content of which is not
accessible on the surface web.
The last two workflows, on the other hand, contain modules that
manipulate excerpt from HTML web pages. Because of this, our
algorithm was not able to find the content on the Web of the input
and output of those modules.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 21 / 26
Feasibility Study: Results
We also measured the number of Top-k web pages that needed to be
examined to identify the input data item corresponding to a given
output data item. On average, we needed to examine the content of
the 4 top web pages returned by the key-word search engine1.
In several cases, however, the top web page was the right one, in the
sense that it contained the input data item we are after.
1
We used the Google search engine for our experiment.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 22 / 26
Feasibility Study: Results
Regarding the queries {q−
1 , . . . , q−
6 }, our algorithm was more
successful in the sense that it was able to correctly identify 4 picky
modules out of 6.
For two remaining workflows, the module that was identified as picky
by our algorithm was not the correct one. After examination, it
transpired that for certain modules the corresponding data item could
not be found on the web.
Again this issue was due to shims modules the input and output data
items are not published on the Web.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 23 / 26
Conclusions
To sum up, this small feasibility study has shown that our method is
promising.
It has also brought some insights into the way our solution can be
improved.
Our ongoing work includes: i)- tuning our algorithm to deal with
shims modules in a workflow, ii)- explore new source of information
for identifying picky modules, and ii)- an experiment involving a large
number of scientific workflows.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 24 / 26
References
K. Belhajjame (2018)
On Answering Why-Not Queries Against Scientific Workflow Provenance
Proceeding of EDBT, Open Proceedings 465–468.
N. Bidoit, M. Herschel, K. Tzompanaki (2014)
Why not?
Proceeding of EDBT, Open Proceedings 145–156.
A. Chapman and H.V. Jagadish (2009)
Why not?
Proceeding of SIGMOD, ACM 523–534.
J. Huang, T. Chen, A. Doan, and J. F. Naughton (2008)
On the provenance of non-answers to queries over extracted data
Proceeding of VLDB, ACM 736-747.
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 25 / 26
The End
Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 26 / 26

Mais conteúdo relacionado

Semelhante a Irpb workshop

Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Chris Hammerschmidt
 
Applied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLApplied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLMarc Teunis
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEIPutuAdiPratama
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
Introduction To OOPS - Principles And Advantages
Introduction To OOPS -  Principles And AdvantagesIntroduction To OOPS -  Principles And Advantages
Introduction To OOPS - Principles And AdvantagesSpotle.ai
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsNeo4j
 
Weblog Extraction With Fuzzy Classification Methods
Weblog Extraction With Fuzzy Classification MethodsWeblog Extraction With Fuzzy Classification Methods
Weblog Extraction With Fuzzy Classification MethodsEdy Portmann
 
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic ijseajournal
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET Journal
 
Train, explain, acclaim. Build a good model in three steps
Train, explain, acclaim.  Build a good model in three stepsTrain, explain, acclaim.  Build a good model in three steps
Train, explain, acclaim. Build a good model in three stepsPrzemek Biecek
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLBryan Bischof
 
A Discrete Krill Herd Optimization Algorithm for Community Detection
A Discrete Krill Herd Optimization Algorithm for Community DetectionA Discrete Krill Herd Optimization Algorithm for Community Detection
A Discrete Krill Herd Optimization Algorithm for Community DetectionAboul Ella Hassanien
 
Polymorphism in java
Polymorphism in javaPolymorphism in java
Polymorphism in javasureshraj43
 
Cold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approachCold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approachieeechennai
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLBryan Bischof
 
Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...IJECEIAES
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
 
Explore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine LearningExplore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine LearningPrzemek Biecek
 

Semelhante a Irpb workshop (20)

Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
 
Applied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLApplied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDL
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
Introduction To OOPS - Principles And Advantages
Introduction To OOPS -  Principles And AdvantagesIntroduction To OOPS -  Principles And Advantages
Introduction To OOPS - Principles And Advantages
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
Weblog Extraction With Fuzzy Classification Methods
Weblog Extraction With Fuzzy Classification MethodsWeblog Extraction With Fuzzy Classification Methods
Weblog Extraction With Fuzzy Classification Methods
 
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial Intelligence
 
Train, explain, acclaim. Build a good model in three steps
Train, explain, acclaim.  Build a good model in three stepsTrain, explain, acclaim.  Build a good model in three steps
Train, explain, acclaim. Build a good model in three steps
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in ML
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
A Discrete Krill Herd Optimization Algorithm for Community Detection
A Discrete Krill Herd Optimization Algorithm for Community DetectionA Discrete Krill Herd Optimization Algorithm for Community Detection
A Discrete Krill Herd Optimization Algorithm for Community Detection
 
Polymorphism in java
Polymorphism in javaPolymorphism in java
Polymorphism in java
 
Cold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approachCold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approach
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in ML
 
Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Explore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine LearningExplore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine Learning
 

Mais de Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsKhalid Belhajjame
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Khalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in SepublicaKhalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenanceKhalid Belhajjame
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Khalid Belhajjame
 

Mais de Khalid Belhajjame (20)

Provenance witha purpose
Provenance witha purposeProvenance witha purpose
Provenance witha purpose
 
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
 
Anr cair meeting feb 2016
Anr cair meeting feb 2016Anr cair meeting feb 2016
Anr cair meeting feb 2016
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)
 

Último

Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Último (20)

Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

Irpb workshop

  • 1. On Answering Why-Not Queries Against Scientific Workflow Provenance Khalid Belhajjame PSL Research University, Paris-Dauphine University, LAMSADE, Paris, 75016, France khalid.belhajjame@dauphine.fr July 13, 2018 Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 1 / 26
  • 2. Context: Scientific Workflows Scientific workflows have been shown to facilitate and accelerate scientific data exploration and analysis in many areas of sciences, including proteomics, metabolics, astronomy, and bio-medicine. The figure on the right side illustrates an example of a simple workflow used for identifying the pathways associated with a given input metabolite (compound). Given a compound identifier, the first module returns a compound name, which is used to feed the second module to obtain the corresponding pathway. Workflow input ports Workflow output ports compound_id get_compound_info output_pathways extract_pathway_from_compounds_file Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 2 / 26
  • 3. Aim: Evaluating Why-Not Queries Against Workflow Executions Why-not queries help scientists understand why a given data item, e.g., their favorite biological pathway, was not returned by the workflow executions. While answering such queries has been thoroughly investigated for relational databases, only a few proposals examined their evaluation in the context of scientific workflows. Objective: To elaborate a solution for evaluating why not queries against workflows with black-box modules. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 3 / 26
  • 4. Related Work: Database (Querying) Land Instance-based attempts to find the data items in the inputs that are responsible for the non appearance of a given data item in the result. Consider the example below (taken from Huang et al. VLDB 2008). The query returns the schools in the state of California are within the top 4 and have job openings. The answer returned by the query is Stanford and its rank in the result. Why-not query: Why does Berkley not appear in he results? What change shall I make to the source to obtain (Homer, 25) in the results? if a potential tuple (berkeley, ca, yes) is inserted into the openings table, Berkeley will become an answer Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 4 / 26
  • 5. Related Work: Database (Querying) Land Module-based attempts to identify the modules (sub-queries) that are responsible for the non-appearance of a given data item in the workflow results. In the case of the previous example, we have only one join, which is responsible in this case for the non appearance of Berkley in the result set of the query. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 5 / 26
  • 6. Related Work: Workflow Land The only proposal in this category for workflow provenance is the Why-Not algorithm proposed by Chapman and Jagadish 2009. Using the Why-Not algorithm proposed by Chapman and Jagadish, the user query is expressed as a set of atomic predicates that are combined using AND and OR. Chapman and Jagadish make the assumption that the attributes of the input datasets are preserved by the modules that compose the workflow. This is not the case, however, in the general case. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 6 / 26
  • 7. Related Work: Workflow Land For example, the modules in the workflow illustrated on the right do not preserve the attribute of the input, viz. Compound − ID, in that the output of the first and the second module do not contain information about the compound identifier. In the work presented in this talk, we drop the assumption made by Chapman and Jagadish, and propose a solution that can be utilized for answering why-not queries for workflow with modules that do not preserve attributes of the input datasets. Furthermore, unlike the Why-Not algorithm which is module-based, our proposal is hybrid in that it seeks to answer instance- and module-based why-not queries. Workflow input ports Workflow output ports compound_id get_compound_info output_pathways extract_pathway_from_compounds_file Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 7 / 26
  • 8. Foundations Why-not query: A user specifies a why-not query by providing a data item dwhy−not that has the same data type as the output of the last module of the workflow and was not returned by the workflow executions. Module pickyness: Central to the evaluation of why-not queries is the pickyness of its modules. A module M in a workflow is picky with respect to a data item d if its inverse Minv does not accept d as input. More specifically, Minv throws an illegal input exception when its execution is fed d. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 8 / 26
  • 9. Processing Why-Not Queries The algorithm for processing why-not queries, takes as input a data item dwhy−not specified by the user To answer a why-not query, the modules of the workflow are explored from the sink to the source in a breadth-first fashion. To do so, we group the workflow modules into levels as illustrated in the figure below. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 9 / 26
  • 10. Processing Why-Not Queries The modules of each level are examined to identify if the module is picky. Specifically, the inverse of the module in question M is examined to check if: 1 It does not accept the corresponding data items that were generated by the inverse of the modules in the previous level. 2 It accepts the corresponding data items that were generated by the inverse of the modules in the previous modules. In this case, the data items the inverse of M produces are saved to be used to feed the inverse of the modules in the succeeding levels, if any. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 10 / 26
  • 11. Identifying Picky Modules To identify if a module M is picky, we need to invoke its inverse Minv , and check if it accepts the data items in question. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 11 / 26
  • 12. Identifying Picky Modules To identify if a module M is picky, we need to invoke its inverse Minv , and check if it accepts the data items in question. However, the inverse module rarely exists. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 12 / 26
  • 13. Identifying Picky Modules To identify if a module M is picky, we need to invoke its inverse Minv , and check if it accepts the data items in question. However, the inverse module rarely exists. To overcome the non-existence of the inverse module, we can probe the modules until we have the output we are after, or else fail and deduce that the module in question is picky. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 13 / 26
  • 14. Identifying Picky Modules To identify if a module M is picky, we need to invoke its inverse Minv , and check if it accepts the data items in question. However, the inverse module rarely exists. To overcome the non-existence of the inverse module, we can probe the modules until we have the output we are after, or else fail and deduce that the module in question is picky. This is not a reasonable solution because the space of valid input values of a module can be very large or even infinite. The problem is exacerbated by the fact that a module may have multiple inputs, therefore requiring the construction of all possible combination for probing. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 14 / 26
  • 15. Identifying Picky Modules To identify if a module M is picky, we need to invoke its inverse Minv , and check if it accepts the data items in question. However, the inverse module rarely exists. To overcome the non-existence of the inverse module, we can probe the modules until we have the output we are after, or else fail and deduce that the module in question is picky. This is not a reasonable solution because the space of valid input values of a module can be very large or even infinite. The problem is exacerbated by the fact that a module may have multiple inputs, therefore requiring the construction of all possible combination for probing. Is there a more reasonable solution... that at least allows us to probe the modules using fewer inputs? Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 15 / 26
  • 16. Identifying Picky Modules by Harvesting the Web A solution that we explored consist in harvesting the (probably) biggest source of information, namely the Web using the information extraction process illustrated below. Indeed, an important number of scientific modules that are provided by major institutions, such as the EBI and DDBJ, provides also for users the means to invoke these modules on the web, and the traces of those module invocation remains in a number of cases accessible on the Web. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 16 / 26
  • 17. Identifying Picky Modules by Harvesting the Web Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 17 / 26
  • 18. Identifying Picky Modules by Harvesting the Web If none of the candidate inputs is found to be true positive, then we conclude that the module is likely to be picky. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 18 / 26
  • 19. Feasibility Study The approach we have just described raises the following question. Is the algorithm proposed able to identify the reason why a given data item does not appear in the work!ow results? More specifically, How effective is this solution in identifying picky modules and missing input data items? To answer the above questions, we run a feasibility experiment, in which we used a sample of 6 real-world workflows from the myExperiment repository. We selected workflows that involve deterministic modules, which mean modules that deliver the same result (if any) given the same input. We did not consider workflows that include modules performing data mining operations, for instance. We have also selected workflows for which the inverse modules are also deterministic functions. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 19 / 26
  • 20. Feasibility Study We have executed each workflow using example data inputs provided by the workflow authors. We then specified two kinds of queries for each work!ow: Instance-based why-not query. To assess the ability of the algorithm in answering this type of queries, we randomly selected an output data item d that was returned by the workflow executions. Next, we used our algorithm to see if it is able to reconstruct the lineage of d by harvesting the web to identify the input data items that were responsible for its derivation. Module-based why-not query This kind of query is used to assess if the algorithm is able to identify picky modules In total we had 6 queries of the first kind, which we denote by {q+ 1 , . . . , q+ 6 }, and 6 queries of the second kind, which we denote by {q− 1 , . . . , q− 6 }. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 20 / 26
  • 21. Feasibility Study: Results Of the queries {q+ 1 , . . . , q+ 6 }, our algorithm was able to successfully constructs the provenance of the why-not query up to the workflow input for 3 queries. Most of the modules composing these workflows, namely 8 out of 11, provides information about the input and output datasets on the Web using Tabular formats. After examination of the three remaining workflows, we found that one them utilizes proprietary data sources, the content of which is not accessible on the surface web. The last two workflows, on the other hand, contain modules that manipulate excerpt from HTML web pages. Because of this, our algorithm was not able to find the content on the Web of the input and output of those modules. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 21 / 26
  • 22. Feasibility Study: Results We also measured the number of Top-k web pages that needed to be examined to identify the input data item corresponding to a given output data item. On average, we needed to examine the content of the 4 top web pages returned by the key-word search engine1. In several cases, however, the top web page was the right one, in the sense that it contained the input data item we are after. 1 We used the Google search engine for our experiment. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 22 / 26
  • 23. Feasibility Study: Results Regarding the queries {q− 1 , . . . , q− 6 }, our algorithm was more successful in the sense that it was able to correctly identify 4 picky modules out of 6. For two remaining workflows, the module that was identified as picky by our algorithm was not the correct one. After examination, it transpired that for certain modules the corresponding data item could not be found on the web. Again this issue was due to shims modules the input and output data items are not published on the Web. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 23 / 26
  • 24. Conclusions To sum up, this small feasibility study has shown that our method is promising. It has also brought some insights into the way our solution can be improved. Our ongoing work includes: i)- tuning our algorithm to deal with shims modules in a workflow, ii)- explore new source of information for identifying picky modules, and ii)- an experiment involving a large number of scientific workflows. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 24 / 26
  • 25. References K. Belhajjame (2018) On Answering Why-Not Queries Against Scientific Workflow Provenance Proceeding of EDBT, Open Proceedings 465–468. N. Bidoit, M. Herschel, K. Tzompanaki (2014) Why not? Proceeding of EDBT, Open Proceedings 145–156. A. Chapman and H.V. Jagadish (2009) Why not? Proceeding of SIGMOD, ACM 523–534. J. Huang, T. Chen, A. Doan, and J. F. Naughton (2008) On the provenance of non-answers to queries over extracted data Proceeding of VLDB, ACM 736-747. Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 25 / 26
  • 26. The End Khalid Belhajjame (Paris-Dauphine) IRPb Workshop July 13, 2018 26 / 26