SlideShare uma empresa Scribd logo
1 de 23
Provenance with a Purpose
Khalid Belhajjame
PSL, Université Paris-Dauphine, LAMSADE
kbelhajj@gmail.com
© K. Belhajjame 1
December 9th, 2022
We start with a short tale ... about provenance
Characters:
• Alice, a scientists who utilize workflows for their computational experiment and
analyses
• Bob, a believer in the greatness of provenance, who wants to spread the word
© K. Belhajjame 2
December 9th, 2022
Workflowsaregreat,buttheycanbedifficultto
makework,andeven when theydid, ittakesme a
long timetomake senseofthe results
© K. Belhajjame 3
December 9th, 2022
Youshoulduseprovenance.
It will helpyouwitha lotofstuff.
© K. Belhajjame 4
December 9th, 2022
Really!like what?
© K. Belhajjame 5
December 9th, 2022
Plentyofthings.
Debugging yourworkflows,understandingtheresults,experimentreporting,analysing
andoptimizingtheworkflow,verifying the results/findingsofothers,reusing the
(intermediate) results…younameit
© K. Belhajjame 6
December 9th, 2022
Soundslike I have foundmy hapiness,I will
definitlytryit
© K. Belhajjame 7
December 9th, 2022
Few months later
© K. Belhajjame 8
December 9th, 2022
Hello Alice, howdid it go?
© K. Belhajjame 9
December 9th, 2022
Hi Bob,tobehonnest,notgreat
© K. Belhajjame 10
December 9th, 2022
The provenancerecordedis toofinegrained, it
takesmeagestoget myheadaroundit, andeven
when I doit doesnothavealwayswhatI really
need
© K. Belhajjame 11
December 9th, 2022
Needless tosaythatI have even moretrouble
makingsense ofthe provenanceoftheexecutions
ofthe workflowsof mycolleagues
© K. Belhajjame 12
December 9th, 2022
Oh, andthe executionofmy workflowsis getting
slower, andI cannotaffordtostoreall collected
provenance… I justremove it afterfew workflow
executions
© K. Belhajjame 13
December 9th, 2022
Moral of the story …
• By and large, provenance in current systems is collected without really considering the
requirements of the applications that will be using it
• As a result, we end up collecting all sorts of things just to find later that:
• Interpretability. Collected provenance is difficult to understand
• Relevance. Most of collected provenance is not relevant for the task at hand,
• Completeness. It does not contains all the information needed for the task at hand.
• This conclusions are not limited to workflows
Capture
Provenance
Workflow System
Provenance Log
© K. Belhajjame 14
What can I do with
collected provenance?
December 9th, 2022
Here, I am arguing for (and by the way coining a new
term), that is “Provenance with a purpose”
© K. Belhajjame 15
December 9th, 2022
Debugging Workflows
• Scenario
• The workflow developer defines breakpoints. A breakpoint is associated with a step (an activity or
subworkflow) in the workflow.
• During the execution of the workflow, the execution of the workflow paused before and after the
activities associated with breakpoints
• Requirements provenance-wise
• Recording and displaying to the workflow developer the data bound to the input and output of
the steps associated with the breakpoint.
• May involve recording the state of objects that are outside the scope of the inputs and outputs of
the activity that is subject too breakpointing, e.g., a file or a database that is upated by the
activity in question
• One can imagine a situation, where the developer alter the input data of a given step that is
associated with a breakpoint
• Input provided by the workflow developer
• Breakpoints
• Optionally, s/he can provide values to use with given activity input values
© K. Belhajjame 16
Relevance
Completness
December 9th, 2022
Experiment Reporting
• Senario
• Summarization:
• Identify the subset of the wokflow (activities that are of interest)
• Retains the information relative only to a subset of the input of the workflow and/or its
output
• Abstraction: specify domain annotations to use
• Inputs provided by the user
• Template for reporting.
• For example, sections that needs to be filled, and the corresponding steps (or
subworkflows) in the overall workflow
• Source of annotations, it can be external resources, e.g., Bio.Tools, but it can be extracted in
certain cases from the data values itself
• Requirements provenance-wise
• Recording only the execution information that is necessary to feed the report
© K. Belhajjame 17
Relevance
Completness
Iterpretability
December 9th, 2022
Policy Verification
• Senario
• A number of policies on the data
• For example, before feeding sensitive data values to a remote analysis, they should be
anonymized or stripped of identifiers
• The way the data is used need to comply with the rights of the owners or the policies
defined on the data
• Provenance wise
• Some policies can be verified by directly analyzing the prospective provenance (workflow
specifications)
• Others can only be checked during the execution of the workflow through analysis of the
retrospective provenance of the workflow
• Not that in this case, the execution of a workflow can be halted if it is found to breach a policy
• Input provided by the user of the workflow
• Policies associated with the datasets that are fed to the workflow, as well as those associated
with the datasets underlying the execution of the activities of the workflow
© K. Belhajjame 18
Relevance
Completness
Iterpretability
December 9th, 2022
© K. Belhajjame 19
Workflow
Engine
Workflow
Exec Traces
Operating
System
Data
management
system
The Web
Information sources
Provenance Augmentation
Abstraction/Annotation
Provenance Layer
Wf
Debugger
Exp
Reporting
Policy
Checker/Enforcer
Applications Layer
Architecture Wf Designer Wf user
Reproducibility
checker
Users
December 9th, 2022
How Does it work ?
© K. Belhajjame 20
Choose your task
Provide necessary
inputs if any
Capture (only the)
necessary provenance
Assist the user in the
task at hand
User
System
System
User
December 9th, 2022
Of course this is far from being perfect…
© K. Belhajjame 21
December 9th, 2022
This is not entierly new
• Alban Gaignard, Hala Skaf-Molli, Khalid Belhajjame: Findable and reusable workflow data products: A genomic
workflow case study. Semantic Web 11(5): 751-763 (2020)
• Renan Souza, Marta Mattoso:Provenance of Dynamic Adaptations in User-Steered Dataflows. IPAW 2018: 16-29
• Timothy M. McPhillips et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering
Workflow Information from Scripts. CoRR abs/1502.02403 (2015)
• Pinar Alper, Khalid Belhajjame, Carole A. Goble: Static analysis of Taverna workflows to predict provenance
patterns. Future Gener. Comput. Syst. 75: 310-329 (2017)
• Daniel Deutch, Amir Gilad, Yuval Moskovitch: Efficient provenance tracking for datalog using top-k queries.
VLDB J. 27(2): 245-269 (2018)
© K. Belhajjame 22
What new then?
A single framwork that caters and can be adaptaed for different
provenance usage scenarios
December 9th, 2022
Provenance with a Purpose
Khalid Belhajjame
PSL, Université Paris-Dauphine, LAMSADE
kbelhajj@gmail.com
© K. Belhajjame 23
December 9th, 2022

Mais conteúdo relacionado

Semelhante a Provenance witha purpose

Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...Cartegraph
 
Best Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the CloudBest Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the CloudDatavail
 
DOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant EnvironmentsDOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant EnvironmentsStefan Oehrli
 
Introduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdfIntroduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdfMahmoud268161
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...vtunotesbysree
 
Accelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the CloudAccelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the CloudWiiisdom
 
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)Gáspár Nagy
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 
Final Presentation FYP 1
Final Presentation FYP 1Final Presentation FYP 1
Final Presentation FYP 1athirahfazilahh
 
Sharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service DevelopmentSharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service DevelopmentHong-Linh Truong
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentialsRajesh P
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentialsRajesh P
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdfAyele40
 
2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdfDan Lantz
 
vodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testingvodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testingvodQA
 
SOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant DatabasesSOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant DatabasesStefan Oehrli
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdfsaman Iftikhar
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 

Semelhante a Provenance witha purpose (20)

Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
 
Best Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the CloudBest Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the Cloud
 
DOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant EnvironmentsDOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant Environments
 
SPM 3.pdf
SPM 3.pdfSPM 3.pdf
SPM 3.pdf
 
Introduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdfIntroduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdf
 
Scope management
Scope managementScope management
Scope management
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Accelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the CloudAccelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the Cloud
 
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
Final Presentation FYP 1
Final Presentation FYP 1Final Presentation FYP 1
Final Presentation FYP 1
 
Sharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service DevelopmentSharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service Development
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf
 
vodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testingvodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testing
 
SOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant DatabasesSOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant Databases
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdf
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 

Mais de Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsKhalid Belhajjame
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Khalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in SepublicaKhalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenanceKhalid Belhajjame
 

Mais de Khalid Belhajjame (20)

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
 
Anr cair meeting feb 2016
Anr cair meeting feb 2016Anr cair meeting feb 2016
Anr cair meeting feb 2016
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 

Último

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Último (20)

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Provenance witha purpose

  • 1. Provenance with a Purpose Khalid Belhajjame PSL, Université Paris-Dauphine, LAMSADE kbelhajj@gmail.com © K. Belhajjame 1 December 9th, 2022
  • 2. We start with a short tale ... about provenance Characters: • Alice, a scientists who utilize workflows for their computational experiment and analyses • Bob, a believer in the greatness of provenance, who wants to spread the word © K. Belhajjame 2 December 9th, 2022
  • 3. Workflowsaregreat,buttheycanbedifficultto makework,andeven when theydid, ittakesme a long timetomake senseofthe results © K. Belhajjame 3 December 9th, 2022
  • 4. Youshoulduseprovenance. It will helpyouwitha lotofstuff. © K. Belhajjame 4 December 9th, 2022
  • 5. Really!like what? © K. Belhajjame 5 December 9th, 2022
  • 6. Plentyofthings. Debugging yourworkflows,understandingtheresults,experimentreporting,analysing andoptimizingtheworkflow,verifying the results/findingsofothers,reusing the (intermediate) results…younameit © K. Belhajjame 6 December 9th, 2022
  • 7. Soundslike I have foundmy hapiness,I will definitlytryit © K. Belhajjame 7 December 9th, 2022
  • 8. Few months later © K. Belhajjame 8 December 9th, 2022
  • 9. Hello Alice, howdid it go? © K. Belhajjame 9 December 9th, 2022
  • 10. Hi Bob,tobehonnest,notgreat © K. Belhajjame 10 December 9th, 2022
  • 11. The provenancerecordedis toofinegrained, it takesmeagestoget myheadaroundit, andeven when I doit doesnothavealwayswhatI really need © K. Belhajjame 11 December 9th, 2022
  • 12. Needless tosaythatI have even moretrouble makingsense ofthe provenanceoftheexecutions ofthe workflowsof mycolleagues © K. Belhajjame 12 December 9th, 2022
  • 13. Oh, andthe executionofmy workflowsis getting slower, andI cannotaffordtostoreall collected provenance… I justremove it afterfew workflow executions © K. Belhajjame 13 December 9th, 2022
  • 14. Moral of the story … • By and large, provenance in current systems is collected without really considering the requirements of the applications that will be using it • As a result, we end up collecting all sorts of things just to find later that: • Interpretability. Collected provenance is difficult to understand • Relevance. Most of collected provenance is not relevant for the task at hand, • Completeness. It does not contains all the information needed for the task at hand. • This conclusions are not limited to workflows Capture Provenance Workflow System Provenance Log © K. Belhajjame 14 What can I do with collected provenance? December 9th, 2022
  • 15. Here, I am arguing for (and by the way coining a new term), that is “Provenance with a purpose” © K. Belhajjame 15 December 9th, 2022
  • 16. Debugging Workflows • Scenario • The workflow developer defines breakpoints. A breakpoint is associated with a step (an activity or subworkflow) in the workflow. • During the execution of the workflow, the execution of the workflow paused before and after the activities associated with breakpoints • Requirements provenance-wise • Recording and displaying to the workflow developer the data bound to the input and output of the steps associated with the breakpoint. • May involve recording the state of objects that are outside the scope of the inputs and outputs of the activity that is subject too breakpointing, e.g., a file or a database that is upated by the activity in question • One can imagine a situation, where the developer alter the input data of a given step that is associated with a breakpoint • Input provided by the workflow developer • Breakpoints • Optionally, s/he can provide values to use with given activity input values © K. Belhajjame 16 Relevance Completness December 9th, 2022
  • 17. Experiment Reporting • Senario • Summarization: • Identify the subset of the wokflow (activities that are of interest) • Retains the information relative only to a subset of the input of the workflow and/or its output • Abstraction: specify domain annotations to use • Inputs provided by the user • Template for reporting. • For example, sections that needs to be filled, and the corresponding steps (or subworkflows) in the overall workflow • Source of annotations, it can be external resources, e.g., Bio.Tools, but it can be extracted in certain cases from the data values itself • Requirements provenance-wise • Recording only the execution information that is necessary to feed the report © K. Belhajjame 17 Relevance Completness Iterpretability December 9th, 2022
  • 18. Policy Verification • Senario • A number of policies on the data • For example, before feeding sensitive data values to a remote analysis, they should be anonymized or stripped of identifiers • The way the data is used need to comply with the rights of the owners or the policies defined on the data • Provenance wise • Some policies can be verified by directly analyzing the prospective provenance (workflow specifications) • Others can only be checked during the execution of the workflow through analysis of the retrospective provenance of the workflow • Not that in this case, the execution of a workflow can be halted if it is found to breach a policy • Input provided by the user of the workflow • Policies associated with the datasets that are fed to the workflow, as well as those associated with the datasets underlying the execution of the activities of the workflow © K. Belhajjame 18 Relevance Completness Iterpretability December 9th, 2022
  • 19. © K. Belhajjame 19 Workflow Engine Workflow Exec Traces Operating System Data management system The Web Information sources Provenance Augmentation Abstraction/Annotation Provenance Layer Wf Debugger Exp Reporting Policy Checker/Enforcer Applications Layer Architecture Wf Designer Wf user Reproducibility checker Users December 9th, 2022
  • 20. How Does it work ? © K. Belhajjame 20 Choose your task Provide necessary inputs if any Capture (only the) necessary provenance Assist the user in the task at hand User System System User December 9th, 2022
  • 21. Of course this is far from being perfect… © K. Belhajjame 21 December 9th, 2022
  • 22. This is not entierly new • Alban Gaignard, Hala Skaf-Molli, Khalid Belhajjame: Findable and reusable workflow data products: A genomic workflow case study. Semantic Web 11(5): 751-763 (2020) • Renan Souza, Marta Mattoso:Provenance of Dynamic Adaptations in User-Steered Dataflows. IPAW 2018: 16-29 • Timothy M. McPhillips et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. CoRR abs/1502.02403 (2015) • Pinar Alper, Khalid Belhajjame, Carole A. Goble: Static analysis of Taverna workflows to predict provenance patterns. Future Gener. Comput. Syst. 75: 310-329 (2017) • Daniel Deutch, Amir Gilad, Yuval Moskovitch: Efficient provenance tracking for datalog using top-k queries. VLDB J. 27(2): 245-269 (2018) © K. Belhajjame 22 What new then? A single framwork that caters and can be adaptaed for different provenance usage scenarios December 9th, 2022
  • 23. Provenance with a Purpose Khalid Belhajjame PSL, Université Paris-Dauphine, LAMSADE kbelhajj@gmail.com © K. Belhajjame 23 December 9th, 2022