SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
PROV-O-Viz
InteractiveProvenanceVisualization
RinkeHoekstra and Paul Groth

VU University Amsterdam/University of Amsterdam
rinke.hoekstra@vu.nl
TM
to
2Data SemanticsSemantics for Scientific Data PublishersFrom Data
Many slides courtesy of PaulGroth
Provenance?
Provenance
byJenniferCompton
http://stillcraic.blogspot.nl/2014/01/tuesday-poem-provenance-by-jennifer.html
Definition

(OxfordEnglishDictionary)
• The fact of coming from some particular source or quarter;
origin, derivation;
• the history or pedigree of a work of art, manuscript, rare
book, etc.;
• concretely, arecordofthepassage of an item through its
various owners.
Provenance
Provenance
Provenance
Making trust judgements on the Web
Provenance
Making trust judgements on the Web
Provenance
Making trust judgements on the Web
Compliance and auditing of business processes
Provenance
Making trust judgements on the Web
Compliance and auditing of business processes
Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Compliance and auditing of business processes
Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Compliance and auditing of business processes
Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Liability, trust and privacy in open government data
Compliance and auditing of business processes
Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Liability, trust and privacy in open government data
Compliance and auditing of business processes
Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Liability, trust and privacy in open government data
Compliance and auditing of business processes
Safeguarding quality, reproducibility and integrity of the scientific process
“WebDesignIssues”
“At the toolbar (menu, whatever) associated
with a document there is a button marked
“Oh, yeah?”. You press it when you lose that
feeling of trust. It says to the Web, “so how
do I know I can trust this information?”. The
software then goes directly or indirectly back
to metainformation about the document,
which suggests a number of reasons.”
Tim Berners-Lee, Web Design Issues, September 1997
ProvenanceinWebDocuments
ProvenanceinWebDocuments
Standards for ethical aggregation?
Curator’s code for attributing discovery?
ProvenanceinOpenGovernment
Need provenance for data integration and reuse

diversity of data sources

varying quality

different scope

different assumptions
“Provenance is the number one
issue that we face when publishing
government data in data.gov.uk”
John Sheridan, UK National Archives, data.gov.uk
ProvenanceinScience
“We need a paradigm that makes it simple […]
to perform and publish reproducible
computational research. […] a Reproducible
Research Environment (RRE) […] provides
computational tools together with the ability
to automatically track the provenance of data,
analysis, and results and to package them (or
pointers to persistent versions of them) for
redistribution.”
Jill Mesirov, Chief Informatics Officer of the MIT/

Harvard Broad Institute, in Science, January 2010
Need provenance for reproducibility 

and verification of processes
W3CWorkingGroup
Provenance is a record that describes the people,
institutions, entities, and activities, involved in
producing, influencing, or delivering a piece of data or
a thing.
http://www.w3.org/TR/prov-overview
Luc Moreau & Paul Groth
Provenance?
• Provenance = Metadata?

Provenance can be seen as metadata, but not all metadata is
provenance
• Provenance = Trust?

Provenance provides a substrate for deriving different trust
metrics
• Provenance = Authentication?

Provenance records can be used to verify and authenticate
amongst users
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability compliance
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability compliance explanation
ThreeDimensions
• Content

Capturing and representing provenance information
• Management

Storing, querying, and accessing provenance information
• Use

Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability compliance explanation debugging
BasicIdea
Whatyoucando…
Warning: provenance is about history!
VisualizationAnyone?
NaiveApproaches
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping

Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
Orbiter has several limitations. It does not have capabilities for query subgraph high-
lighting, regular expression filters, process grouping, annotations, or programmable views[16].
Furthermore, the structure of each summary node, where child nodes are grouped within
parents and are hidden until the parent is expanded, benefits queries earlier in the depen-
dency chain. Initial overviews often correspond with system bootup, and appear very similar
across di↵erent traces (time slices of system activity).
Figure 10: In these screenshots of Orbiter, the presence of edges overwhelms the visibility of
nodes. By relying on a node-link graph layout and using spatial location to encode object
relationships, Orbiter’s graph layout algorithm must draw many long edges to communi-
cate node connections. Without edge bundling or opacity variation, the meanings of these
relationships are obscured.
Another one of Orbiter’s weaknesses is its node-link diagram layout. As a result, each
node’s position in the X-Y plane and the length and angle of connecting lines are wasted
attributes. The chosen graph layout algorithm (dot by default) arranges nodes to minimize
Figure 11: (Top): A screenshot of the portion of the graph generated by GraphViz for a
trace of the third provenance challenge. (Bottom): A zoomed-in view of the same graph.
The horizontal black bars across the images are dense collections of edges.
E↵ective large graph visualizations present the user with a summary view that can be
explored, filtered, and expanded interactively.
2.5 Tree Visualization
While trees are a subcategory of graphs, because of their hierarchical composition, tree visu-
alization forms its own subfield of research. A survey of over two-hundred tree visualizations
is given at Hans-Jrg Schulz’s treevis.net. Visitors can narrow down by dimensionality
(2D, 3D, or mixed), representation (explicit node-link diagram, implicit treemap, or combi-
nation), alignment (XY plot, radial layout, or free diagram)[55]. These categories are shown
Figure 12: Left: Pajek uses various summary node-link and matrix-based representations
depending on the structure of the supplied data set. Pictured is a main core subgraph
extracted from routing data on the Internet. Right: TopoLayout optimizes the choice of
visualization display depending on the underlying graph structure. The right column is
TopoLayout’s output, while the left and middle columns are the outputs of the GRIP and
FM graph layout algorithms.
Figure 13: treevis.net defines di↵erent categories for tree maps. Tree maps can be cate-
gorized by dimensionality (2D, 3D, or mixed), representation (explicit, implicit, or mixed),
or alignment (XY, radial, or spring).
Tree visualizations are either explicit or implicit. Explicit representations resemble node-
link diagrams. An example of an implicit representation is a tree map, a diagram where the
entire tree is inscribed in a rectangle representing the root node. This root is subdivided
hierarchically into more rectangles, which represent child nodes, and each child node is
subdivided into more child nodes. Treemaps are excellent for displaying hierarchical or
categorical data[57]. One famous example, shown in Figure 14, is the “Map of the Market”
from SmartMoney.com, which displays in red and green the changes in market value of
publicly-traded companies, grouped by market sector, with cell size proportional to market
capitalization[64].
TreePlus is an example of a tree-inspired graph visualization tool (Figure 15). It uses
the guiding metaphor of “plant a seed to watch it grow” to summarize navigation of its tree-
InProv
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping

Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
6 Final Design
Figure 30: A view of a cluster of system activity. This particular timeslice shows the activity
of the init.sh and mount processes.
This visualization was designed with the Visual Information-Seeking Mantra in mind -
“overview first, zoom and filter, then details-on-demand”[56].
D3.js
Visualize the magnitudeofflow between nodes in a network
PROV-O-Vizhttp://provoviz.org
PROV-O-Vizhttp://provoviz.org
Insert any PROV-O RDF
Or connect to a SPARQL endpoint
Width of activities and entities is based on informationflow
Activities and entities are extracted from an egograph
Move activities and entities around
Hover over interesting dependencies
Embed graph into your own webpage
TomdeNies(Ghent University)

SaraMagliacane (VU University Amsterdam)
Discussion
• Provenance is vital in many areas

government, science, industry, …
• PROV is the W3Cstandard for expressing provenance
• Provenance graphs can be overwhelming and complex
• PROV-O-Viz builds intuitive Sankey-style visualizations
• … for any provenance trace expressed using PROV
to
2Data SemanticsSemantics for Scientific Data PublishersFrom Data
http://semweb.cs.vu.nl/provoviz
Thanks to: Paul Groth, Provenance XG, WG, Luc Moreau, James Cheney, Paolo Missier, Olaf Hartig, Satya Sahoo

Mais conteúdo relacionado

Mais procurados

Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
lahorisher
 

Mais procurados (20)

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
SemTecBiz 2012: Corporate Semantic Web
SemTecBiz 2012: Corporate Semantic WebSemTecBiz 2012: Corporate Semantic Web
SemTecBiz 2012: Corporate Semantic Web
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as Commodities
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 

Destaque

Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
Eric Shupps
 
Ch05 records management
Ch05 records managementCh05 records management
Ch05 records management
xtin101
 
Ch04 records management
Ch04 records managementCh04 records management
Ch04 records management
xtin101
 
Ch03 records management
Ch03 records managementCh03 records management
Ch03 records management
xtin101
 

Destaque (20)

QBer - Connect your data to the cloud
QBer - Connect your data to the cloudQBer - Connect your data to the cloud
QBer - Connect your data to the cloud
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Provenance Information in the Web of Data
Provenance Information in the Web of DataProvenance Information in the Web of Data
Provenance Information in the Web of Data
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open Data
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemes
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)
 
Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau
Keepit Course 3: Provenance (and OPM), based on slides by Luc MoreauKeepit Course 3: Provenance (and OPM), based on slides by Luc Moreau
Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Digital Media Episodic Downoadable (Podcasts) - Downham
Digital Media Episodic Downoadable (Podcasts) - DownhamDigital Media Episodic Downoadable (Podcasts) - Downham
Digital Media Episodic Downoadable (Podcasts) - Downham
 
Heritage Management Learning Module
Heritage Management Learning ModuleHeritage Management Learning Module
Heritage Management Learning Module
 
Addressing Diversity in Archival Collections with Outreach
Addressing Diversity in Archival Collections with OutreachAddressing Diversity in Archival Collections with Outreach
Addressing Diversity in Archival Collections with Outreach
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010
 
Ch05 records management
Ch05 records managementCh05 records management
Ch05 records management
 
Keeping a record for your appraisal - Mathieu
Keeping a record for your appraisal - MathieuKeeping a record for your appraisal - Mathieu
Keeping a record for your appraisal - Mathieu
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality Assessment
 
Ch04 records management
Ch04 records managementCh04 records management
Ch04 records management
 
Records inventory final
Records inventory finalRecords inventory final
Records inventory final
 
Ch03 records management
Ch03 records managementCh03 records management
Ch03 records management
 

Semelhante a Prov-O-Viz: Interactive Provenance Visualization

Provenance and Trust
Provenance and TrustProvenance and Trust
Provenance and Trust
Jose Manuel Gómez-Pérez
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid Services
Martin Szomszor
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
IJEACS
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
Natural Language Processing & Semantic Models in an Imperfect World
Natural Language Processing & Semantic Modelsin an Imperfect WorldNatural Language Processing & Semantic Modelsin an Imperfect World
Natural Language Processing & Semantic Models in an Imperfect World
Vital.AI
 

Semelhante a Prov-O-Viz: Interactive Provenance Visualization (20)

A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERING
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGA NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERING
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERING
 
Provenance and Trust
Provenance and TrustProvenance and Trust
Provenance and Trust
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid Services
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Big data visualization state of the art
Big data visualization state of the artBig data visualization state of the art
Big data visualization state of the art
 
Natural Language Processing & Semantic Models in an Imperfect World
Natural Language Processing & Semantic Modelsin an Imperfect WorldNatural Language Processing & Semantic Modelsin an Imperfect World
Natural Language Processing & Semantic Models in an Imperfect World
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
Data Provenance and PROV Ontology
Data Provenance and PROV OntologyData Provenance and PROV Ontology
Data Provenance and PROV Ontology
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open Data
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management
 

Mais de Rinke Hoekstra

Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
Rinke Hoekstra
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
Rinke Hoekstra
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
Rinke Hoekstra
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
Rinke Hoekstra
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
Rinke Hoekstra
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design Patterns
Rinke Hoekstra
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the Netherlands
Rinke Hoekstra
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site Visit
Rinke Hoekstra
 

Mais de Rinke Hoekstra (20)

Jurix 2014 welcome presentation
Jurix 2014 welcome presentationJurix 2014 welcome presentation
Jurix 2014 welcome presentation
 
Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research Data
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design Patterns
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids Data
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the Netherlands
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site Visit
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web Technology
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Prov-O-Viz: Interactive Provenance Visualization

  • 1. PROV-O-Viz InteractiveProvenanceVisualization RinkeHoekstra and Paul Groth
 VU University Amsterdam/University of Amsterdam rinke.hoekstra@vu.nl TM to 2Data SemanticsSemantics for Scientific Data PublishersFrom Data Many slides courtesy of PaulGroth
  • 4. Definition
 (OxfordEnglishDictionary) • The fact of coming from some particular source or quarter; origin, derivation; • the history or pedigree of a work of art, manuscript, rare book, etc.; • concretely, arecordofthepassage of an item through its various owners.
  • 9. Provenance Making trust judgements on the Web Compliance and auditing of business processes
  • 10. Provenance Making trust judgements on the Web Compliance and auditing of business processes
  • 11. Provenance Making trust judgements on the Web Licensing and attribution of combined information Compliance and auditing of business processes
  • 12. Provenance Making trust judgements on the Web Licensing and attribution of combined information Compliance and auditing of business processes
  • 13. Provenance Making trust judgements on the Web Licensing and attribution of combined information Liability, trust and privacy in open government data Compliance and auditing of business processes
  • 14. Provenance Making trust judgements on the Web Licensing and attribution of combined information Liability, trust and privacy in open government data Compliance and auditing of business processes
  • 15. Provenance Making trust judgements on the Web Licensing and attribution of combined information Liability, trust and privacy in open government data Compliance and auditing of business processes Safeguarding quality, reproducibility and integrity of the scientific process
  • 16. “WebDesignIssues” “At the toolbar (menu, whatever) associated with a document there is a button marked “Oh, yeah?”. You press it when you lose that feeling of trust. It says to the Web, “so how do I know I can trust this information?”. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons.” Tim Berners-Lee, Web Design Issues, September 1997
  • 18. ProvenanceinWebDocuments Standards for ethical aggregation? Curator’s code for attributing discovery?
  • 19. ProvenanceinOpenGovernment Need provenance for data integration and reuse
 diversity of data sources
 varying quality
 different scope
 different assumptions “Provenance is the number one issue that we face when publishing government data in data.gov.uk” John Sheridan, UK National Archives, data.gov.uk
  • 20. ProvenanceinScience “We need a paradigm that makes it simple […] to perform and publish reproducible computational research. […] a Reproducible Research Environment (RRE) […] provides computational tools together with the ability to automatically track the provenance of data, analysis, and results and to package them (or pointers to persistent versions of them) for redistribution.” Jill Mesirov, Chief Informatics Officer of the MIT/
 Harvard Broad Institute, in Science, January 2010 Need provenance for reproducibility 
 and verification of processes
  • 21.
  • 22. W3CWorkingGroup Provenance is a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing. http://www.w3.org/TR/prov-overview Luc Moreau & Paul Groth
  • 23. Provenance? • Provenance = Metadata?
 Provenance can be seen as metadata, but not all metadata is provenance • Provenance = Trust?
 Provenance provides a substrate for deriving different trust metrics • Provenance = Authentication?
 Provenance records can be used to verify and authenticate amongst users
  • 24. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice
  • 25. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording
  • 26. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating
  • 27. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems
  • 28. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability
  • 29. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability interoperability
  • 30. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability interoperability trust
  • 31. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability interoperability trust accountability
  • 32. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability interoperability trust accountability compliance
  • 33. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability interoperability trust accountability compliance explanation
  • 34. ThreeDimensions • Content
 Capturing and representing provenance information • Management
 Storing, querying, and accessing provenance information • Use
 Interpreting and understanding provenance in practice recording annotating workflow systems scalability interoperability trust accountability compliance explanation debugging
  • 37.
  • 38. Warning: provenance is about history!
  • 40. NaiveApproaches InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping
 Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf Orbiter has several limitations. It does not have capabilities for query subgraph high- lighting, regular expression filters, process grouping, annotations, or programmable views[16]. Furthermore, the structure of each summary node, where child nodes are grouped within parents and are hidden until the parent is expanded, benefits queries earlier in the depen- dency chain. Initial overviews often correspond with system bootup, and appear very similar across di↵erent traces (time slices of system activity). Figure 10: In these screenshots of Orbiter, the presence of edges overwhelms the visibility of nodes. By relying on a node-link graph layout and using spatial location to encode object relationships, Orbiter’s graph layout algorithm must draw many long edges to communi- cate node connections. Without edge bundling or opacity variation, the meanings of these relationships are obscured. Another one of Orbiter’s weaknesses is its node-link diagram layout. As a result, each node’s position in the X-Y plane and the length and angle of connecting lines are wasted attributes. The chosen graph layout algorithm (dot by default) arranges nodes to minimize Figure 11: (Top): A screenshot of the portion of the graph generated by GraphViz for a trace of the third provenance challenge. (Bottom): A zoomed-in view of the same graph. The horizontal black bars across the images are dense collections of edges. E↵ective large graph visualizations present the user with a summary view that can be explored, filtered, and expanded interactively. 2.5 Tree Visualization While trees are a subcategory of graphs, because of their hierarchical composition, tree visu- alization forms its own subfield of research. A survey of over two-hundred tree visualizations is given at Hans-Jrg Schulz’s treevis.net. Visitors can narrow down by dimensionality (2D, 3D, or mixed), representation (explicit node-link diagram, implicit treemap, or combi- nation), alignment (XY plot, radial layout, or free diagram)[55]. These categories are shown Figure 12: Left: Pajek uses various summary node-link and matrix-based representations depending on the structure of the supplied data set. Pictured is a main core subgraph extracted from routing data on the Internet. Right: TopoLayout optimizes the choice of visualization display depending on the underlying graph structure. The right column is TopoLayout’s output, while the left and middle columns are the outputs of the GRIP and FM graph layout algorithms. Figure 13: treevis.net defines di↵erent categories for tree maps. Tree maps can be cate- gorized by dimensionality (2D, 3D, or mixed), representation (explicit, implicit, or mixed), or alignment (XY, radial, or spring). Tree visualizations are either explicit or implicit. Explicit representations resemble node- link diagrams. An example of an implicit representation is a tree map, a diagram where the entire tree is inscribed in a rectangle representing the root node. This root is subdivided hierarchically into more rectangles, which represent child nodes, and each child node is subdivided into more child nodes. Treemaps are excellent for displaying hierarchical or categorical data[57]. One famous example, shown in Figure 14, is the “Map of the Market” from SmartMoney.com, which displays in red and green the changes in market value of publicly-traded companies, grouped by market sector, with cell size proportional to market capitalization[64]. TreePlus is an example of a tree-inspired graph visualization tool (Figure 15). It uses the guiding metaphor of “plant a seed to watch it grow” to summarize navigation of its tree-
  • 41. InProv InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping
 Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf 6 Final Design Figure 30: A view of a cluster of system activity. This particular timeslice shows the activity of the init.sh and mount processes. This visualization was designed with the Visual Information-Seeking Mantra in mind - “overview first, zoom and filter, then details-on-demand”[56].
  • 42. D3.js Visualize the magnitudeofflow between nodes in a network
  • 44. PROV-O-Vizhttp://provoviz.org Insert any PROV-O RDF Or connect to a SPARQL endpoint
  • 45.
  • 46.
  • 47. Width of activities and entities is based on informationflow Activities and entities are extracted from an egograph
  • 48. Move activities and entities around Hover over interesting dependencies
  • 49. Embed graph into your own webpage
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Discussion • Provenance is vital in many areas
 government, science, industry, … • PROV is the W3Cstandard for expressing provenance • Provenance graphs can be overwhelming and complex • PROV-O-Viz builds intuitive Sankey-style visualizations • … for any provenance trace expressed using PROV to 2Data SemanticsSemantics for Scientific Data PublishersFrom Data http://semweb.cs.vu.nl/provoviz Thanks to: Paul Groth, Provenance XG, WG, Luc Moreau, James Cheney, Paolo Missier, Olaf Hartig, Satya Sahoo