SlideShare uma empresa Scribd logo
1 de 45
iSOCO


Provenance: From eScience to the Web of Data

           José Manuel Gómez Pérez

                Invited Lecture
              CETINIA 17/11/2009
Agenda




Introduction to Provenance
Semantic Overlays for Provenance Analysis
The Web of Data
Provenance in the Web of Data




                                       2
Provenance is…




Records of

Origin or source from
which something comes
History of subsequent
owners (change of
custody)



                        Adapted from James Cheney’s Principles of Provenance
                                                                       3
Provenance is…




Evidence of authenticity, integrity,
and quality
Certifies products of good process




                                       Adapted from James Cheney’s Principles of Provenance
                                                                                      4
Provenance is…




Valuable
Hard to collect and verify
Necessary to assign credit
…and blame

i.e. establish

Trust
                             Adapted from James Cheney’s Principles of Provenance
                                                                            5
Why provenance of electronic data is difficult


      Paper data                        Electronic data


Creation process leaves       Often, there is no bits
paper trail                   trail
Easier to detect              Easy to forge,
modification, copy,           plagiarize, and modify
forgery                       data
Usually, one can judge        There is no cover to
a book by the cover           judge by
                                                 Addressing this requires
                                           explicitly representing the
                                           provenance of data, store it, keep it
                                               secure, and reason with it.

                               Adapted from James Cheney’s Principles of Provenance
                                                                              6
Provenance in eScience




One of the most active fields in Provenance development

Curated scientific biologic databases
 - Ensure database quality
 - Need provenance for data quality control and accountability
 - Currently done manually by curators
Scientific workflows – grid computing
 - Abstract process execution complexity
 - Need provenance for process reproducibility, efficiency
 - Currently supported by ad-hoc systems



                                                                      7
Past approaches to provenance in eScience




                                       8
Agenda




Introduction to Provenance
Semantic Overlays for Provenance Analysis
The Web of Data
Provenance in the Web of Data




                                       9
Provenance analysis of process executions




?
                                           10
Semantic overlays for provenance analysis


Objective: To support domain experts in
                                                              Problem Solving Methods
understanding process executions                              (PSMs) (McDermott 1988)
                                                     How
                                                                • Provide reusable guidelines
                                                                to formulate process
                                                                knowledge
                                                                • Support reasoning
                                                                • Describe the main rationale
                                          Semantic              behind a process


    What
                                          Overlays
                                                                                   Whom



         PROVENANCE                                           SMEs




                                                                                         11
PSM perspectives

  Task-method                      Interaction
 decomposition




                                                 Black-box perspective
                                                 Knowledge transformation
                                                 within the PSM


                                   Hierarchically defines how tasks
PSM establishes and controls the   decompose into simpler
sequence of actions required to    (sub)tasks
perform a task                     Describes tasks at several levels
Defines knowledge required at      of detail
each task step
                                   Provides alternative ways to
                                   achieve a task
        Knowledge flow



                                                                             Task
                                                                             Method
                                                                             Role


                                                                            12
Towards knowledge provenance


                                           PSMs as semantic overlays on top
                                           of existing process documentation
                                                 Task: What is going to be
                                                 achieved by executing a process

                                                 PSM: HOW




                 Provenance, from a knowledge perspective
                  -   How recorded provenance relates to the execution of a
                      process
                  -   Simpler process analysis proposing decompositions into
                      simpler subprocesses
                  -   Visualize provenance at different levels of detail
                 Supporting domain experts in two main ways
                  -   Validation of process executions
Source: myGrid    -   Identification of reasoning patterns in process executions

                                                                              13
The twig join function


    Based on XML pattern matching algorithms on Directed Acyclic
    Graphs (Bruno et al., 2002)
    twig_join detects the occurrence of a pattern in a XML DAG
    Given
-      P, a process
-      T, a task potentially describing P
-      M, a PSM providing a strategy on how to achieve T
-      i(T), the set of input roles of T
-      o(T), the set of output roles of T
-      D, the DAG resulting from documenting the execution of P
    twig_join(D,i(T),o(T)) checks whether a twig exists for M that
    connects i(T) with o(T) in D

    In this case, PSM M is the pattern to be identified in the process
    documentation DAG D

                                                                           14
A twig join example in provenance analysis


Domain      Bridges    PSM entities
entities   (mapping)




                                                             twig join!




                                                                          15
The matching algorithm

                                                •   twig_join recursively applied at
                              Task-method
                             decomposition
                                                    each decomposition level
                                                •   Each task decomposed by one
                                                    or several PSMs (task-method
twig_join(Ti, D)                                    decomposition view)
                                                •   Knowledge flow defines the
                                                    sequence of evaluation
decompose(Ti)



                   twig_join(T11, D)


                                                                     Knowledge flow
                   twig_join(T12, D)




                   twig_join(T13, D)
                                       Backtracking
                                       possible at PSM and
                                       role levels
                   twig_join(T14, D)
Interaction


                                                                                 16
KOPE: A Knowledge-Oriented Provenance Environment



                  PSM-             Matching
                 Ontology        visualization
                 bridges




                Provenance        Matching
                   query          detection




                                                 17
KOPE Evaluation
                    PSM Catalogue
                     Task-Method
                    Decomposition




Brain Atlas
 Workflow
                  PSM Catalogue
                  Knowledge Flow




                             18
KOPE video




        19
KOPE evaluation (II)

 120%
                                                                Focus on precision and recall
 100%
                                                                metrics
  80%

  60%
                                                    Precision   Identified at three different
                                                    Recall
  40%                                                           layered contexts
  20%                                                            - Method
   0%
           Level1        Level2   Level3   Level4                - Task
Goal 1: identify the main
                                                                 - Decomposition-level
rationale behind process
executions by detecting
occurrences of semantic
overlays in their logs

Goal 2: To exploit the
structure of semantic
overlays to describe
process executions at
different levels of detail

  Perfect match
  Partial match
  No match



                                                                                           20
Agenda




Introduction to Provenance
Semantic Overlays for Provenance Analysis
The Web of Data
Provenance in the Web of Data




                                       21
WAKE UP VIDEO!




                 22
While the economy contracts, the digital universe expands…




Source: IDC
                                       In 2006, the size of the digital universe
                                       was estimated in 161 exabytes
                                       3 million times, the information in all
                                       books ever written
                                       By 2010, expected to turn 988
                                       exabytes
                                       …and all this data is potentially
                                       exposed online
                                                                            23
Web data




      24
The Linked Data paradigm



                                  Tim Berners Lee, 2006 (Design Issues)
         How can we
        exploit all the
       available data?            1. Use URIs to identify things
                                      -   Anything, not just documents
                                  2. Use HTTP URIs for people to
Data reuse and remix                 lookup such names
Common flexible and usable APIs       -   Globally unique names
Standard vocabularies to              -   Distributed ownership
describe interlinked datasets     3. Provide useful information in RDF
Tools                                upon URI resolution
Realize the Semantic Web vision   4. Include RDF links to other URIs
                                      -   Enable discovery of related
                                          information




                                                                          25
The Linked Data Cloud (May 2007)




                              26
The Linked Data Cloud (August 2007)




                                 27
The Linked Data Cloud (March 2008)




                                28
The Linked Data Cloud (September 2008)




                                    29
The Linked Data Cloud (March 2009)




                                30
The Web of Data


Apply the Linked Data principles to expose open datasets in
RDF
Define RDF links between data items for different datasets
Over 7.5 billion triples, 5 million links (as of November 2009)




                                                                  31
Linked Data going mainstream




                          32
Agenda




Introduction to Provenance
Semantic Overlays for Provenance Analysis
The Web of Data
Provenance in the Web of Data




                                       33
A real-life example


  Linking and exploiting distributed data sets without the
means that allow contrasting its provenance can be harmful,
                                                              Two fake web sites
              especially in sensitive domains.
                                                              A fake Wikipedia entry
                                                              Fake California public safety phone
                                                              numbers

                                                              The hoax caused a 1000-word tome on
                                                              Frankfurter Allgemeine Zeitung… and
                                                              public apologies from DPA

                                                              Trust on Wikipedia misled DPA
                                                              In a provenance-aware world, DPA
                                                              would have had means based on data
                                                              provenance to automatically check that
                                                               -   The town did not exist
                                                               -   The Berlin Boys do not exist
                                                               -   The reporting local TV station does not exist

                                                                                                           34
The Linked Data flow




 Linked Data applications

                                     Data trustworthiness

               Exploit Linked Data
                 SPARQL EPRs




                                                            Provenance
                                                            Provenance
       Linked Data
                                         Data quality

               Publish Linked Data
                (RDF, HTTP, URIs)


Web documents
                                         Data lineage
Multimedia
Legacy resources e.g.
DBs, XML repositories




                                                                         35
Provenance and Linked Data


Linked Data is largely about reusing. However, reusing data from 3rd
parties requires knowing its provenance!!!   Is the data        Is the quality
                                              reliable?          of the data
Provenance shall provide the ability to                             good?
 - Trace the sources of data
 - Enable the exploration of relationships between datasets, their authors and
   affiliations
Provenance analysis shall provide an insight on how data is produced
and exploited
Provenance should create a notion of information quality
 - is a certain dataset consistent and up to date?
 - is the connection between two interlinked datasets meaningful?
 - is a given dataset relevant for a particular domain?
Provenance to establish information trustworthiness
Provenance to provide data views following some criteria



                                                                                 36
Provenance challenges in the Web of Data




Provenance information needs to be

Represented
Captured and recorded
Stored and secured, queried, and reasoned about
Visualized and browsed




                                                              37
A Provenance architecture for the Web of Data



   Authoritative
agencies required
to certify and keep
 data provenance
     secure!!!




                                                                 38
Semantics in support of provenance in the Web of Data


Semantic Web                                    Provenance
   stack                                           stack
                           This, we still
                          need to define!



                                              Information quality
                                                   inference

                                              Trust inference

                                   Reasoning with provenance

                                   Provenance querying

                                 Provenance capture

                        Provenance access policy definition

                           Provenance encryption




                                                                    39
Towards a model of Web Data provenance

                                               Adapted from Olaf Hartig’s Provenance
                                               Information in the Web of data
Provenance represented as a graph
 - Nodes: provenance elements (pieces of provenance information)
 - Edges: relate provenance elements to each other
 - Subgraphs for related data items possible
Provenance models define
 - Types of provenance elements (roles)
 - Relationships between them


      Actor


     Execution


      Artifact




                                                                                 40
Provenance-related vocabularies


DC – Dublin Core Metadata Terms
FOAF – Friend of a Friend
SIOC – Semantically-Interlinked Online
Communities
SWP – Semantic Web Publishing vocabulary
WOT – Web Of Trust schema
VOiD – VOcabulary of Interlinked Datasets


           However, general lack of
             provenance-related
           metadata on the Web of
                    Data!



                                                                   41
Action points




Provenance             Awareness of         Tools for data
vocabularies           data providers         providers
Represent and reason
                                                Generation of
    with trust and
                        W3C Provenance IG   provenance metadata
 information quality

  Extend emerging
                                                Provenance
    Linked Data
                                            authoritative agencies
    vocabularies
                           Linked Data
                         standards (VOiD        Provenance
       VOiD                   again)            visualization




                                                                     42
An example of provenance visualization




                                    43
Questions




       44
José Manuel Gómez-Pérez
           Thanks for                                      R&D Director
              your                                        T +34913349778
           attention!                                    M +34609077103
                                                       jmgomez@isoco.com


                                     iSOCO
   Para obtener más información sobre como               puede
 ayudar a su empresa a optimizar sus negocios digitales y aportar
            una solución innovadora, contáctenos en
                       www.            .com
              Barcelona                 Madrid                          Valencia
Tel +34 93 5677200         +34 91 3349797                +34 96 3467143
Edificio Testa A           C/Pedro de Valdivia, 10       Oficina 107
C/ Alcalde Barnils 64-68   28006 Madrid                  C/ Prof. Beltrán Báguena 4,
St. Cugat del Vallès                                     46009 Valencia
08174 Barcelona




                                                                                       45

Mais conteúdo relacionado

Mais procurados

BEKEE, Expert Knowledge Modeling with Bayesian Belief Networks
BEKEE, Expert Knowledge Modeling with Bayesian Belief NetworksBEKEE, Expert Knowledge Modeling with Bayesian Belief Networks
BEKEE, Expert Knowledge Modeling with Bayesian Belief Networksjouffe
 
Intuitive Technology
Intuitive TechnologyIntuitive Technology
Intuitive TechnologyMProcun
 
Blended Enterprise Investigations
Blended Enterprise InvestigationsBlended Enterprise Investigations
Blended Enterprise InvestigationsJohn Grand
 
Complexity Thinking for Scrum Teams
Complexity Thinking for Scrum TeamsComplexity Thinking for Scrum Teams
Complexity Thinking for Scrum Teamsantonrossouw
 
Economic Attention Networks
Economic Attention NetworksEconomic Attention Networks
Economic Attention NetworksMatthew Ikle
 
Sternberg Poster
Sternberg Poster Sternberg Poster
Sternberg Poster souf18
 
Andrew Brennan and Ruth Banner - DVD training package
Andrew Brennan and Ruth Banner - DVD training packageAndrew Brennan and Ruth Banner - DVD training package
Andrew Brennan and Ruth Banner - DVD training packageCOT SSNP
 

Mais procurados (11)

BEKEE, Expert Knowledge Modeling with Bayesian Belief Networks
BEKEE, Expert Knowledge Modeling with Bayesian Belief NetworksBEKEE, Expert Knowledge Modeling with Bayesian Belief Networks
BEKEE, Expert Knowledge Modeling with Bayesian Belief Networks
 
Heart & Mind
Heart & MindHeart & Mind
Heart & Mind
 
Intuitive Technology
Intuitive TechnologyIntuitive Technology
Intuitive Technology
 
Gifted futures
Gifted futures Gifted futures
Gifted futures
 
Wu wei coaching
Wu wei coachingWu wei coaching
Wu wei coaching
 
Thinking About Climate Change
Thinking About Climate ChangeThinking About Climate Change
Thinking About Climate Change
 
Blended Enterprise Investigations
Blended Enterprise InvestigationsBlended Enterprise Investigations
Blended Enterprise Investigations
 
Complexity Thinking for Scrum Teams
Complexity Thinking for Scrum TeamsComplexity Thinking for Scrum Teams
Complexity Thinking for Scrum Teams
 
Economic Attention Networks
Economic Attention NetworksEconomic Attention Networks
Economic Attention Networks
 
Sternberg Poster
Sternberg Poster Sternberg Poster
Sternberg Poster
 
Andrew Brennan and Ruth Banner - DVD training package
Andrew Brennan and Ruth Banner - DVD training packageAndrew Brennan and Ruth Banner - DVD training package
Andrew Brennan and Ruth Banner - DVD training package
 

Semelhante a Provenance: From e-Science to the Web Of Data

PROACtive Process Overview
PROACtive Process OverviewPROACtive Process Overview
PROACtive Process OverviewDavid Vine
 
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...Mark Matienzo
 
Improving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective SearchImproving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective SearchScott Abel
 
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryFebruary 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryJohn Wang
 
Semantic Enterprise Architecture
Semantic Enterprise ArchitectureSemantic Enterprise Architecture
Semantic Enterprise ArchitectureMichael zur Muehlen
 
Distributed_Database_System
Distributed_Database_SystemDistributed_Database_System
Distributed_Database_SystemPhilip Zhong
 
The Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER ProjectThe Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER ProjectDETER-Project
 
343 beyond piaget and ips1
343 beyond piaget and ips1343 beyond piaget and ips1
343 beyond piaget and ips1Anna Montes
 
Upping the Ante -- ECM Meets BPM
Upping the Ante -- ECM Meets BPMUpping the Ante -- ECM Meets BPM
Upping the Ante -- ECM Meets BPMDerek E. Weeks
 
12 1012 m3 bpp manchester km 1 ver 0102
12 1012 m3 bpp manchester   km 1 ver 010212 1012 m3 bpp manchester   km 1 ver 0102
12 1012 m3 bpp manchester km 1 ver 0102ma-design.com
 
12 1012 m3 bpp manchester km 1 ver 0102
12 1012 m3 bpp manchester   km 1 ver 010212 1012 m3 bpp manchester   km 1 ver 0102
12 1012 m3 bpp manchester km 1 ver 0102ma-design.com
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionWil van der Aalst
 
Process mining chapter_01_introduction
Process mining chapter_01_introductionProcess mining chapter_01_introduction
Process mining chapter_01_introductionMuhammad Ajmal
 
Bringing Together Content and Process
Bringing Together Content and ProcessBringing Together Content and Process
Bringing Together Content and ProcessOpenText Global 360
 
Knowledge management solutions for development sector InfoAxon approach
Knowledge management solutions for development sector   InfoAxon approachKnowledge management solutions for development sector   InfoAxon approach
Knowledge management solutions for development sector InfoAxon approachInfoAxon Technologies Limited
 
Organisational Problem Solving & Innovation
Organisational Problem Solving & InnovationOrganisational Problem Solving & Innovation
Organisational Problem Solving & InnovationCory Banks
 
Bush.stewart
Bush.stewartBush.stewart
Bush.stewartNASAPMC
 

Semelhante a Provenance: From e-Science to the Web Of Data (20)

PROACtive Process Overview
PROACtive Process OverviewPROACtive Process Overview
PROACtive Process Overview
 
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
 
Improving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective SearchImproving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective Search
 
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryFebruary 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
 
Semantic Enterprise Architecture
Semantic Enterprise ArchitectureSemantic Enterprise Architecture
Semantic Enterprise Architecture
 
Distributed_Database_System
Distributed_Database_SystemDistributed_Database_System
Distributed_Database_System
 
The Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER ProjectThe Science of Cyber Security Experimentation: The DETER Project
The Science of Cyber Security Experimentation: The DETER Project
 
343 beyond piaget and ips1
343 beyond piaget and ips1343 beyond piaget and ips1
343 beyond piaget and ips1
 
Datamining
DataminingDatamining
Datamining
 
Upping the Ante -- ECM Meets BPM
Upping the Ante -- ECM Meets BPMUpping the Ante -- ECM Meets BPM
Upping the Ante -- ECM Meets BPM
 
12 1012 m3 bpp manchester km 1 ver 0102
12 1012 m3 bpp manchester   km 1 ver 010212 1012 m3 bpp manchester   km 1 ver 0102
12 1012 m3 bpp manchester km 1 ver 0102
 
12 1012 m3 bpp manchester km 1 ver 0102
12 1012 m3 bpp manchester   km 1 ver 010212 1012 m3 bpp manchester   km 1 ver 0102
12 1012 m3 bpp manchester km 1 ver 0102
 
Process skills table 1.1
Process skills table 1.1Process skills table 1.1
Process skills table 1.1
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - Introduction
 
Process mining chapter_01_introduction
Process mining chapter_01_introductionProcess mining chapter_01_introduction
Process mining chapter_01_introduction
 
Bringing Together Content and Process
Bringing Together Content and ProcessBringing Together Content and Process
Bringing Together Content and Process
 
Knowledge management solutions for development sector InfoAxon approach
Knowledge management solutions for development sector   InfoAxon approachKnowledge management solutions for development sector   InfoAxon approach
Knowledge management solutions for development sector InfoAxon approach
 
Organisational Problem Solving & Innovation
Organisational Problem Solving & InnovationOrganisational Problem Solving & Innovation
Organisational Problem Solving & Innovation
 
Lean Innovation
Lean InnovationLean Innovation
Lean Innovation
 
Bush.stewart
Bush.stewartBush.stewart
Bush.stewart
 

Mais de Jose Manuel Gómez-Pérez

Mais de Jose Manuel Gómez-Pérez (9)

Science religion-dsmeetupv1.0
Science religion-dsmeetupv1.0Science religion-dsmeetupv1.0
Science religion-dsmeetupv1.0
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Trust and linked data jmgomez-v1.1
Trust and linked data jmgomez-v1.1Trust and linked data jmgomez-v1.1
Trust and linked data jmgomez-v1.1
 
Halo Pcs Kcap2007 V2
Halo Pcs Kcap2007 V2Halo Pcs Kcap2007 V2
Halo Pcs Kcap2007 V2
 
Acquisition And Understanding Of Process Knowledgev1 1
Acquisition And Understanding Of Process Knowledgev1 1Acquisition And Understanding Of Process Knowledgev1 1
Acquisition And Understanding Of Process Knowledgev1 1
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
 
Next Challenges in Corporate Knowledge Management
Next Challenges in Corporate Knowledge ManagementNext Challenges in Corporate Knowledge Management
Next Challenges in Corporate Knowledge Management
 
Tecnologías Semánticas en Salud
Tecnologías Semánticas en SaludTecnologías Semánticas en Salud
Tecnologías Semánticas en Salud
 
Provenance and Trust
Provenance and TrustProvenance and Trust
Provenance and Trust
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Provenance: From e-Science to the Web Of Data

  • 1. iSOCO Provenance: From eScience to the Web of Data José Manuel Gómez Pérez Invited Lecture CETINIA 17/11/2009
  • 2. Agenda Introduction to Provenance Semantic Overlays for Provenance Analysis The Web of Data Provenance in the Web of Data 2
  • 3. Provenance is… Records of Origin or source from which something comes History of subsequent owners (change of custody) Adapted from James Cheney’s Principles of Provenance 3
  • 4. Provenance is… Evidence of authenticity, integrity, and quality Certifies products of good process Adapted from James Cheney’s Principles of Provenance 4
  • 5. Provenance is… Valuable Hard to collect and verify Necessary to assign credit …and blame i.e. establish Trust Adapted from James Cheney’s Principles of Provenance 5
  • 6. Why provenance of electronic data is difficult Paper data Electronic data Creation process leaves Often, there is no bits paper trail trail Easier to detect Easy to forge, modification, copy, plagiarize, and modify forgery data Usually, one can judge There is no cover to a book by the cover judge by Addressing this requires explicitly representing the provenance of data, store it, keep it secure, and reason with it. Adapted from James Cheney’s Principles of Provenance 6
  • 7. Provenance in eScience One of the most active fields in Provenance development Curated scientific biologic databases - Ensure database quality - Need provenance for data quality control and accountability - Currently done manually by curators Scientific workflows – grid computing - Abstract process execution complexity - Need provenance for process reproducibility, efficiency - Currently supported by ad-hoc systems 7
  • 8. Past approaches to provenance in eScience 8
  • 9. Agenda Introduction to Provenance Semantic Overlays for Provenance Analysis The Web of Data Provenance in the Web of Data 9
  • 10. Provenance analysis of process executions ? 10
  • 11. Semantic overlays for provenance analysis Objective: To support domain experts in Problem Solving Methods understanding process executions (PSMs) (McDermott 1988) How • Provide reusable guidelines to formulate process knowledge • Support reasoning • Describe the main rationale Semantic behind a process What Overlays Whom PROVENANCE SMEs 11
  • 12. PSM perspectives Task-method Interaction decomposition Black-box perspective Knowledge transformation within the PSM Hierarchically defines how tasks PSM establishes and controls the decompose into simpler sequence of actions required to (sub)tasks perform a task Describes tasks at several levels Defines knowledge required at of detail each task step Provides alternative ways to achieve a task Knowledge flow Task Method Role 12
  • 13. Towards knowledge provenance PSMs as semantic overlays on top of existing process documentation Task: What is going to be achieved by executing a process PSM: HOW Provenance, from a knowledge perspective - How recorded provenance relates to the execution of a process - Simpler process analysis proposing decompositions into simpler subprocesses - Visualize provenance at different levels of detail Supporting domain experts in two main ways - Validation of process executions Source: myGrid - Identification of reasoning patterns in process executions 13
  • 14. The twig join function Based on XML pattern matching algorithms on Directed Acyclic Graphs (Bruno et al., 2002) twig_join detects the occurrence of a pattern in a XML DAG Given - P, a process - T, a task potentially describing P - M, a PSM providing a strategy on how to achieve T - i(T), the set of input roles of T - o(T), the set of output roles of T - D, the DAG resulting from documenting the execution of P twig_join(D,i(T),o(T)) checks whether a twig exists for M that connects i(T) with o(T) in D In this case, PSM M is the pattern to be identified in the process documentation DAG D 14
  • 15. A twig join example in provenance analysis Domain Bridges PSM entities entities (mapping) twig join! 15
  • 16. The matching algorithm • twig_join recursively applied at Task-method decomposition each decomposition level • Each task decomposed by one or several PSMs (task-method twig_join(Ti, D) decomposition view) • Knowledge flow defines the sequence of evaluation decompose(Ti) twig_join(T11, D) Knowledge flow twig_join(T12, D) twig_join(T13, D) Backtracking possible at PSM and role levels twig_join(T14, D) Interaction 16
  • 17. KOPE: A Knowledge-Oriented Provenance Environment PSM- Matching Ontology visualization bridges Provenance Matching query detection 17
  • 18. KOPE Evaluation PSM Catalogue Task-Method Decomposition Brain Atlas Workflow PSM Catalogue Knowledge Flow 18
  • 20. KOPE evaluation (II) 120% Focus on precision and recall 100% metrics 80% 60% Precision Identified at three different Recall 40% layered contexts 20% - Method 0% Level1 Level2 Level3 Level4 - Task Goal 1: identify the main - Decomposition-level rationale behind process executions by detecting occurrences of semantic overlays in their logs Goal 2: To exploit the structure of semantic overlays to describe process executions at different levels of detail Perfect match Partial match No match 20
  • 21. Agenda Introduction to Provenance Semantic Overlays for Provenance Analysis The Web of Data Provenance in the Web of Data 21
  • 23. While the economy contracts, the digital universe expands… Source: IDC In 2006, the size of the digital universe was estimated in 161 exabytes 3 million times, the information in all books ever written By 2010, expected to turn 988 exabytes …and all this data is potentially exposed online 23
  • 24. Web data 24
  • 25. The Linked Data paradigm Tim Berners Lee, 2006 (Design Issues) How can we exploit all the available data? 1. Use URIs to identify things - Anything, not just documents 2. Use HTTP URIs for people to Data reuse and remix lookup such names Common flexible and usable APIs - Globally unique names Standard vocabularies to - Distributed ownership describe interlinked datasets 3. Provide useful information in RDF Tools upon URI resolution Realize the Semantic Web vision 4. Include RDF links to other URIs - Enable discovery of related information 25
  • 26. The Linked Data Cloud (May 2007) 26
  • 27. The Linked Data Cloud (August 2007) 27
  • 28. The Linked Data Cloud (March 2008) 28
  • 29. The Linked Data Cloud (September 2008) 29
  • 30. The Linked Data Cloud (March 2009) 30
  • 31. The Web of Data Apply the Linked Data principles to expose open datasets in RDF Define RDF links between data items for different datasets Over 7.5 billion triples, 5 million links (as of November 2009) 31
  • 32. Linked Data going mainstream 32
  • 33. Agenda Introduction to Provenance Semantic Overlays for Provenance Analysis The Web of Data Provenance in the Web of Data 33
  • 34. A real-life example Linking and exploiting distributed data sets without the means that allow contrasting its provenance can be harmful, Two fake web sites especially in sensitive domains. A fake Wikipedia entry Fake California public safety phone numbers The hoax caused a 1000-word tome on Frankfurter Allgemeine Zeitung… and public apologies from DPA Trust on Wikipedia misled DPA In a provenance-aware world, DPA would have had means based on data provenance to automatically check that - The town did not exist - The Berlin Boys do not exist - The reporting local TV station does not exist 34
  • 35. The Linked Data flow Linked Data applications Data trustworthiness Exploit Linked Data SPARQL EPRs Provenance Provenance Linked Data Data quality Publish Linked Data (RDF, HTTP, URIs) Web documents Data lineage Multimedia Legacy resources e.g. DBs, XML repositories 35
  • 36. Provenance and Linked Data Linked Data is largely about reusing. However, reusing data from 3rd parties requires knowing its provenance!!! Is the data Is the quality reliable? of the data Provenance shall provide the ability to good? - Trace the sources of data - Enable the exploration of relationships between datasets, their authors and affiliations Provenance analysis shall provide an insight on how data is produced and exploited Provenance should create a notion of information quality - is a certain dataset consistent and up to date? - is the connection between two interlinked datasets meaningful? - is a given dataset relevant for a particular domain? Provenance to establish information trustworthiness Provenance to provide data views following some criteria 36
  • 37. Provenance challenges in the Web of Data Provenance information needs to be Represented Captured and recorded Stored and secured, queried, and reasoned about Visualized and browsed 37
  • 38. A Provenance architecture for the Web of Data Authoritative agencies required to certify and keep data provenance secure!!! 38
  • 39. Semantics in support of provenance in the Web of Data Semantic Web Provenance stack stack This, we still need to define! Information quality inference Trust inference Reasoning with provenance Provenance querying Provenance capture Provenance access policy definition Provenance encryption 39
  • 40. Towards a model of Web Data provenance Adapted from Olaf Hartig’s Provenance Information in the Web of data Provenance represented as a graph - Nodes: provenance elements (pieces of provenance information) - Edges: relate provenance elements to each other - Subgraphs for related data items possible Provenance models define - Types of provenance elements (roles) - Relationships between them Actor Execution Artifact 40
  • 41. Provenance-related vocabularies DC – Dublin Core Metadata Terms FOAF – Friend of a Friend SIOC – Semantically-Interlinked Online Communities SWP – Semantic Web Publishing vocabulary WOT – Web Of Trust schema VOiD – VOcabulary of Interlinked Datasets However, general lack of provenance-related metadata on the Web of Data! 41
  • 42. Action points Provenance Awareness of Tools for data vocabularies data providers providers Represent and reason Generation of with trust and W3C Provenance IG provenance metadata information quality Extend emerging Provenance Linked Data authoritative agencies vocabularies Linked Data standards (VOiD Provenance VOiD again) visualization 42
  • 43. An example of provenance visualization 43
  • 44. Questions 44
  • 45. José Manuel Gómez-Pérez Thanks for R&D Director your T +34913349778 attention! M +34609077103 jmgomez@isoco.com iSOCO Para obtener más información sobre como puede ayudar a su empresa a optimizar sus negocios digitales y aportar una solución innovadora, contáctenos en www. .com Barcelona Madrid Valencia Tel +34 93 5677200 +34 91 3349797 +34 96 3467143 Edificio Testa A C/Pedro de Valdivia, 10 Oficina 107 C/ Alcalde Barnils 64-68 28006 Madrid C/ Prof. Beltrán Báguena 4, St. Cugat del Vallès 46009 Valencia 08174 Barcelona 45