SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Semantic Representation of
Provenance in Wikipedia
Fabrizio Orlandi¹, Pierre-Antoine Champin², Alexandre Passant¹
SWPM 2010
Shanghai – 7th Nov 2010
¹ Digital Enterprise Research Institute – National University of Ireland, Galway
² LIRIS, Université de Lyon, CNRS, UMR5205, Lyon
Digital Enterprise Research Institute www.deri.ie
WikipediaWikipedia is one of the widest-known knowledge bases available on the Webis one of the widest-known knowledge bases available on the Web
Everyone can contributeEveryone can contribute TrustTrust andand qualityquality concerns!concerns!
Use ofUse of provenanceprovenance information to identify trust and quality values for pagesinformation to identify trust and quality values for pages
MotivationMotivation
2 of 23
Data Provenance as theData Provenance as the historyhistory, the, the originsorigins and theand the evolutionevolution of data.of data.
Ability to answer the following questions about data:Ability to answer the following questions about data:
WhoWho created/modified it?created/modified it? WhenWhen??
WhatWhat is the content?is the content? WhereWhere is it located?is it located?
HowHow andand WhyWhy was it created?was it created?
WhichWhich tools and processes were used?tools and processes were used?
Digital Enterprise Research Institute www.deri.ie
• By representing Wikipedia provenance information with Semantic WebBy representing Wikipedia provenance information with Semantic Web
technologies we enable:technologies we enable:
– TransparencyTransparency
– ReusabilityReusability
– Integration with the Web of DataIntegration with the Web of Data
• Our contribution:Our contribution:
– A semantic model to represent provenance information in wikisA semantic model to represent provenance information in wikis
– A software architecture to extract provenance from WikipediaA software architecture to extract provenance from Wikipedia
– An application that uses and exposes provenance data to computeAn application that uses and exposes provenance data to compute
measures and statistics on Wikipedia articlesmeasures and statistics on Wikipedia articles
3 of 23
Semantic provenance in WikipediaSemantic provenance in Wikipedia
Digital Enterprise Research Institute www.deri.ie
TheThe SIOCSIOC CoreCore ontology:ontology:
http://rdfs.org/sioc/spechttp://rdfs.org/sioc/spec
4 of 23
• WikiWiki andand WikiArticleWikiArticle classes with theclasses with the SIOCSIOC TypesTypes module.module.
AdvantagesAdvantages of using SIOC:of using SIOC:
• Widely used on the Web.Widely used on the Web.
• IntegrationIntegration with existing SIOC data and other popular lightweight ontologies like FOAF, DC, etc.with existing SIOC data and other popular lightweight ontologies like FOAF, DC, etc.
• Same queries to find items on aSame queries to find items on a WikiWiki or aor a BlogBlog,, ForumForum, etc., etc.
SIOCSIOC
Semantically-Interlinked Online CommunitiesSemantically-Interlinked Online Communities
Describes the content andDescribes the content and
structure of community sites.structure of community sites.
Digital Enterprise Research Institute www.deri.ie
• From aFrom a document-centricdocument-centric (SIOC)(SIOC) to anto an action-centricaction-centric (SIOC Actions)(SIOC Actions) view of onlineview of online
communities.communities. [Champin, Passant – 2010][Champin, Passant – 2010]
• It represents the dynamics of online communities, how they evolve:It represents the dynamics of online communities, how they evolve:
– A set ofA set of actionsactions, performed by a, performed by a useruser at someat some timetime, impacting one or more, impacting one or more
objectsobjects..
– In Wikipedia actions areIn Wikipedia actions are editsedits made by users on the articles.made by users on the articles.
Relies on theRelies on the Event OntologyEvent Ontology [Raimond et al. - 2007][Raimond et al. - 2007]
http://motools.sourceforge.net/event/event.htmlhttp://motools.sourceforge.net/event/event.html
The SIOCThe SIOC Actions moduleActions module
5 of 23
Digital Enterprise Research Institute www.deri.ie
• Ontological model created to describe the semantics of data provenanceOntological model created to describe the semantics of data provenance
[Ram, Liu - 2007][Ram, Liu - 2007]
– Based on the Bunge's ontology (Based on the Bunge's ontology (19771977).).
– Tracks theTracks the historyhistory of theof the eventsevents affecting the status ofaffecting the status of thingsthings duringduring
theirtheir lifcyclelifcycle..
– Extensible and generic, it can be used in different domains.Extensible and generic, it can be used in different domains.
– 7 interrogative words:7 interrogative words: WhatWhat,, HowHow,, WhenWhen,, WhereWhere,, WhoWho,, WhichWhich,, WhyWhy..
– Not implemented in RDFS/OWL.Not implemented in RDFS/OWL.
The W7 ModelThe W7 Model
6 of 23
Digital Enterprise Research Institute www.deri.ie
1 – What1 – What
AnAn eventevent (i.e. change of state) that happens to data during its life time(i.e. change of state) that happens to data during its life time
In Wikipedia every type of event (In Wikipedia every type of event (creation, modification, deletioncreation, modification, deletion) leads to) leads to
thethe creation of a new article revisioncreation of a new article revision..
Just using SIOC Core we can modelJust using SIOC Core we can model versioningversioning and history of wiki articles.and history of wiki articles.
Our modelling solutionOur modelling solution
7 of 23
<http://example.com/action?title=Linked_Data#38010613>
sioca:creates
<http://en.wikipedia.org/w/index.php?title=Linked_Data&oldid=38010613>;
sioca:modifies
<http://en.wikipedia.org/wiki/Linked_Data>;
a sioca:Action.
Digital Enterprise Research Institute www.deri.ie
• 2 – How2 – How
TheThe actionaction leading to an event.leading to an event.
• In Wikipedia the actions are theIn Wikipedia the actions are the editsedits applied to the articles.applied to the articles.
• By analyzingBy analyzing diffsdiffs between revisions we identify thebetween revisions we identify the type of actiontype of action involvedinvolved
in the creation of the newer revisionin the creation of the newer revision
(( InsertionInsertion || UpdateUpdate || DeletionDeletion ) () ( SentenceSentence || ReferenceReference ))
• To model the differences between revisions we created a lightweightTo model the differences between revisions we created a lightweight DiffDiff
ontologyontology that aims at describingthat aims at describing changes to plain text documentschanges to plain text documents..
(http://vocab.deri.ie/diff#)(http://vocab.deri.ie/diff#)
Our modelling solutionOur modelling solution
8 of 23
Digital Enterprise Research Institute www.deri.ie
3 – When3 – When
TheThe timetime an event occurs.an event occurs.
• In Wikipedia every edit has a timestamp recorded, and edits areIn Wikipedia every edit has a timestamp recorded, and edits are
considered instantaneous.considered instantaneous.
• Use ofUse of dc:createddc:created oror event:timeevent:time
Our modelling solutionOur modelling solution
9 of 23
<http://example.com/action?title=Linked_Data#380106133>
dc:created "2010-08-21T06:36:17Z";
event:time [
a time:Instant;
time:inXSDDateTime "2010-08-21T06:36:17Z".
];
a sioca:Action.
Digital Enterprise Research Institute www.deri.ie
4 – Where4 – Where
The onlineThe online spacespace or the location associated with an event.or the location associated with an event.
In Wikipedia the information about the location of the user editing theIn Wikipedia the information about the location of the user editing the
page is not provided.page is not provided.
This information cannot be modelled.This information cannot be modelled.
Our modelling solutionOur modelling solution
10 of 23
Digital Enterprise Research Institute www.deri.ie
Our modelling solutionOur modelling solution
11 of 23
5 – Who5 – Who
AnAn agentagent involved in an event.involved in an event.
In Wikipedia it is represented by theIn Wikipedia it is represented by the editoreditor of a page.of a page.
We use theWe use the sioc:UserAccountsioc:UserAccount class to identify the account of the agentclass to identify the account of the agent
<http://example.com/action?title=Linked_Data#36243686>
sioc:has_creator
<http://en.wikipedia.org/wiki/User:Timbl>;
a sioca:Action.
Digital Enterprise Research Institute www.deri.ie
Our modelling solutionOur modelling solution
12 of 23
6 – Which6 – Which
The programs orThe programs or instrumentsinstruments used in the event.used in the event.
• In Wikipedia it is represented by the MediaWiki software used to edit theIn Wikipedia it is represented by the MediaWiki software used to edit the
articles.articles.
• Different in case the editor is a “bot”.Different in case the editor is a “bot”.
Digital Enterprise Research Institute www.deri.ie
Our modelling solutionOur modelling solution
13 of 23
7 – Why7 – Why
TheThe reasonsreasons behind the event occurrence.behind the event occurrence.
• In Wikipedia it is defined by the justifications for a change inserted by aIn Wikipedia it is defined by the justifications for a change inserted by a
user in theuser in the “comment”“comment” field.field.
• PropertyProperty diff:commentdiff:comment with thewith the diff:Diffdiff:Diff class as domain.class as domain.
Digital Enterprise Research Institute www.deri.ie
Our modelling solutionOur modelling solution
14 of 23
Digital Enterprise Research Institute www.deri.ie
Application using Wikipedia provenance dataApplication using Wikipedia provenance data
The application is composed mainly in 3 parts:The application is composed mainly in 3 parts:
• Data CollectionData Collection
– Extracts and generates provenance data from Wikipedia using our model.Extracts and generates provenance data from Wikipedia using our model.
• Firefox plug-inFirefox plug-in
– From the provenance data collected, it computes and shows statisticalFrom the provenance data collected, it computes and shows statistical
information directly on Wikipedia pages.information directly on Wikipedia pages.
• Exposing the data to the Web of dataExposing the data to the Web of data
– The statistical information and the provenance data are provided asThe statistical information and the provenance data are provided as
Linked Open Data.Linked Open Data.
15 of 23
Digital Enterprise Research Institute www.deri.ie
Data CollectionData Collection
A PHP script has been developed to extract all the articles belonging to aA PHP script has been developed to extract all the articles belonging to a
categorycategory and all its subcategories, and for each article, its entireand all its subcategories, and for each article, its entire revision historyrevision history..
Then the program extracts provenance information from the articles collected atThen the program extracts provenance information from the articles collected at
the previous step: it calculates thethe previous step: it calculates the diffdiff function between versions and retrievesfunction between versions and retrieves
other information from the Wikipedia API.other information from the Wikipedia API.
We ran our experiment with theWe ran our experiment with the “Semantic Web”“Semantic Web” category and all itscategory and all its 166166
Wikipedia articles. All the data has been loaded in a RDF store.Wikipedia articles. All the data has been loaded in a RDF store.
16 of 23
Digital Enterprise Research Institute www.deri.ie
Data CollectionData Collection
17 of 23
Digital Enterprise Research Institute www.deri.ie
A Firefox plug-inA Firefox plug-in
• This application displays a table directly on top of Wikipedia articlesThis application displays a table directly on top of Wikipedia articles
exposing information about the most active users and their edits.exposing information about the most active users and their edits.
• It is composed by:It is composed by:
– 1) The1) The triplestoretriplestore, exposing a SPARQL endpoint;, exposing a SPARQL endpoint;
– 2) A2) A PHP scriptPHP script, which queries the triplestore and sends the results to, which queries the triplestore and sends the results to
the Greasemonkey script;the Greasemonkey script;
– 3) A3) A Greasemonkey scriptGreasemonkey script, which retrieves the URL of the Wikipedia, which retrieves the URL of the Wikipedia
loaded page, sends the request to the PHP script and then displays theloaded page, sends the request to the PHP script and then displays the
returned HTML data on the Wikipedia page.returned HTML data on the Wikipedia page.
18 of 23
Digital Enterprise Research Institute www.deri.ie
A Firefox plug-inA Firefox plug-in
19 of 23
Digital Enterprise Research Institute www.deri.ie
To the Web of dataTo the Web of data
• The application is currently available atThe application is currently available at
http://vmuss06.deri.ie/WikiProvenance/index.phphttp://vmuss06.deri.ie/WikiProvenance/index.php..
• Using this web service is possible to have RDF for the provenance dataUsing this web service is possible to have RDF for the provenance data
generated with our model.generated with our model.
• It is also possible to have the statistical information displayed with theIt is also possible to have the statistical information displayed with the
Firefox plugin represented in RDF.Firefox plugin represented in RDF.
• To represent the statistics we use SCOVO, the Statistical Core VocabularyTo represent the statistics we use SCOVO, the Statistical Core Vocabulary
(http://vocab.deri.ie/scovo)(http://vocab.deri.ie/scovo)
20 of 23
Digital Enterprise Research Institute www.deri.ie
To the Web of dataTo the Web of data
• As an example the following triples represent that:As an example the following triples represent that:
the user “KingsleyIdehen” made 11 edits on the SIOC pagethe user “KingsleyIdehen” made 11 edits on the SIOC page
21 of 23
@prefix WikiStats: <http://vmuss06.deri.ie/WikipediaStats.owl#>.
@prefix scovo: <http://purl.org/NET/scovo#>.
<WikiStats:title=SIOC&user=KingsleyIdehen&edits>
a scovo:Item ;
rdf:value 11 ;
scovo:dimension WikiStats:Edits ;
scovo:dimension <http://wikipedia.org/wiki/SIOC>;
scovo:dimension <http://wikipedia.org/wiki/User:KingsleyIdehen>.
Digital Enterprise Research Institute www.deri.ie
Conclusions and Future WorkConclusions and Future Work
Our contributionOur contribution:
• A specific lightweight ontology for provenance in wikis, based on the W7 model and SIOC.A specific lightweight ontology for provenance in wikis, based on the W7 model and SIOC.
• A framework for the extraction of provenance data from Wikipedia.A framework for the extraction of provenance data from Wikipedia.
• An application to access the generated data in a meaningful way and to expose it to theAn application to access the generated data in a meaningful way and to expose it to the
Web of data.Web of data.
Future work:Future work:

A refinement of the proposed model and anA refinement of the proposed model and an alignmentalignment with other general-purposewith other general-purpose
ontologies for provenance representation.ontologies for provenance representation.

To improve theTo improve the performanceperformance and extend theand extend the featuresfeatures of the application.of the application.

To model statistics using theTo model statistics using the SDMXSDMX vocabularyvocabulary (Statistical Data and Metadata eXchange)(Statistical Data and Metadata eXchange)
22 of 23
CommentComment:
• VeryVery large amount of datalarge amount of data generated for the “Semantic Web” category and its 166generated for the “Semantic Web” category and its 166
articles: almost 1.5 million triples for a total of 8.656 revisions.articles: almost 1.5 million triples for a total of 8.656 revisions.
Digital Enterprise Research Institute www.deri.ie
Applications and source code:Applications and source code:
http://vmuss06.deri.ie/WikiProvenance/index.phphttp://vmuss06.deri.ie/WikiProvenance/index.php
The Diff ontology:The Diff ontology:
http://vocab.deri.ie/diffhttp://vocab.deri.ie/diff##
Contacts:Contacts:
fabrizio.orlandi@deri.orgfabrizio.orlandi@deri.org
@BadmotorF@BadmotorF
http://www.slideshare.net/badmotorfingerhttp://www.slideshare.net/badmotorfinger
23 of 23
Questions ?Questions ?

Mais conteúdo relacionado

Semelhante a Semantic Representation of Provenance in Wikipedia

Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 

Semelhante a Semantic Representation of Provenance in Wikipedia (20)

ImageJ and the SciJava software stack
ImageJ and the SciJava software stackImageJ and the SciJava software stack
ImageJ and the SciJava software stack
 
Seminar and workshop on Embedded Systems
Seminar and workshop on Embedded SystemsSeminar and workshop on Embedded Systems
Seminar and workshop on Embedded Systems
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
Knowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems ScienceKnowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems Science
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
myExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesmyExperiment and the Rise of Social Machines
myExperiment and the Rise of Social Machines
 
Doing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarDoing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers Seminar
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Simbios - Open Science in Biocomputational Research
Simbios - Open Science in Biocomputational ResearchSimbios - Open Science in Biocomputational Research
Simbios - Open Science in Biocomputational Research
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Digital library presentation
Digital library presentationDigital library presentation
Digital library presentation
 
W3 C Intro And Beyond - Eyal Sela
W3 C Intro And Beyond - Eyal SelaW3 C Intro And Beyond - Eyal Sela
W3 C Intro And Beyond - Eyal Sela
 

Mais de Fabrizio Orlandi

Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - posterSemantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - poster
Fabrizio Orlandi
 

Mais de Fabrizio Orlandi (10)

Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Modelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsModelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphs
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
Semantic user profiling and Personalised filtering of the Twitter stream
Semantic user profiling and Personalised filtering of the Twitter streamSemantic user profiling and Personalised filtering of the Twitter stream
Semantic user profiling and Personalised filtering of the Twitter stream
 
Semantic search on heterogeneous wiki systems - Wikimania 2010
Semantic search on heterogeneous wiki systems - Wikimania 2010Semantic search on heterogeneous wiki systems - Wikimania 2010
Semantic search on heterogeneous wiki systems - Wikimania 2010
 
Semantic Search on Heterogeneous Wiki Systems - wikisym2010
Semantic Search on Heterogeneous Wiki Systems - wikisym2010Semantic Search on Heterogeneous Wiki Systems - wikisym2010
Semantic Search on Heterogeneous Wiki Systems - wikisym2010
 
Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - posterSemantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - poster
 
Semantic Search on Heterogeneous Wiki Systems - Short
Semantic Search on Heterogeneous Wiki Systems - ShortSemantic Search on Heterogeneous Wiki Systems - Short
Semantic Search on Heterogeneous Wiki Systems - Short
 
Enabling cross-wikis integration by extending the SIOC ontology
Enabling cross-wikis integration by extending the SIOC ontologyEnabling cross-wikis integration by extending the SIOC ontology
Enabling cross-wikis integration by extending the SIOC ontology
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Semantic Representation of Provenance in Wikipedia

  • 1. © Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Semantic Representation of Provenance in Wikipedia Fabrizio Orlandi¹, Pierre-Antoine Champin², Alexandre Passant¹ SWPM 2010 Shanghai – 7th Nov 2010 ¹ Digital Enterprise Research Institute – National University of Ireland, Galway ² LIRIS, Université de Lyon, CNRS, UMR5205, Lyon
  • 2. Digital Enterprise Research Institute www.deri.ie WikipediaWikipedia is one of the widest-known knowledge bases available on the Webis one of the widest-known knowledge bases available on the Web Everyone can contributeEveryone can contribute TrustTrust andand qualityquality concerns!concerns! Use ofUse of provenanceprovenance information to identify trust and quality values for pagesinformation to identify trust and quality values for pages MotivationMotivation 2 of 23 Data Provenance as theData Provenance as the historyhistory, the, the originsorigins and theand the evolutionevolution of data.of data. Ability to answer the following questions about data:Ability to answer the following questions about data: WhoWho created/modified it?created/modified it? WhenWhen?? WhatWhat is the content?is the content? WhereWhere is it located?is it located? HowHow andand WhyWhy was it created?was it created? WhichWhich tools and processes were used?tools and processes were used?
  • 3. Digital Enterprise Research Institute www.deri.ie • By representing Wikipedia provenance information with Semantic WebBy representing Wikipedia provenance information with Semantic Web technologies we enable:technologies we enable: – TransparencyTransparency – ReusabilityReusability – Integration with the Web of DataIntegration with the Web of Data • Our contribution:Our contribution: – A semantic model to represent provenance information in wikisA semantic model to represent provenance information in wikis – A software architecture to extract provenance from WikipediaA software architecture to extract provenance from Wikipedia – An application that uses and exposes provenance data to computeAn application that uses and exposes provenance data to compute measures and statistics on Wikipedia articlesmeasures and statistics on Wikipedia articles 3 of 23 Semantic provenance in WikipediaSemantic provenance in Wikipedia
  • 4. Digital Enterprise Research Institute www.deri.ie TheThe SIOCSIOC CoreCore ontology:ontology: http://rdfs.org/sioc/spechttp://rdfs.org/sioc/spec 4 of 23 • WikiWiki andand WikiArticleWikiArticle classes with theclasses with the SIOCSIOC TypesTypes module.module. AdvantagesAdvantages of using SIOC:of using SIOC: • Widely used on the Web.Widely used on the Web. • IntegrationIntegration with existing SIOC data and other popular lightweight ontologies like FOAF, DC, etc.with existing SIOC data and other popular lightweight ontologies like FOAF, DC, etc. • Same queries to find items on aSame queries to find items on a WikiWiki or aor a BlogBlog,, ForumForum, etc., etc. SIOCSIOC Semantically-Interlinked Online CommunitiesSemantically-Interlinked Online Communities Describes the content andDescribes the content and structure of community sites.structure of community sites.
  • 5. Digital Enterprise Research Institute www.deri.ie • From aFrom a document-centricdocument-centric (SIOC)(SIOC) to anto an action-centricaction-centric (SIOC Actions)(SIOC Actions) view of onlineview of online communities.communities. [Champin, Passant – 2010][Champin, Passant – 2010] • It represents the dynamics of online communities, how they evolve:It represents the dynamics of online communities, how they evolve: – A set ofA set of actionsactions, performed by a, performed by a useruser at someat some timetime, impacting one or more, impacting one or more objectsobjects.. – In Wikipedia actions areIn Wikipedia actions are editsedits made by users on the articles.made by users on the articles. Relies on theRelies on the Event OntologyEvent Ontology [Raimond et al. - 2007][Raimond et al. - 2007] http://motools.sourceforge.net/event/event.htmlhttp://motools.sourceforge.net/event/event.html The SIOCThe SIOC Actions moduleActions module 5 of 23
  • 6. Digital Enterprise Research Institute www.deri.ie • Ontological model created to describe the semantics of data provenanceOntological model created to describe the semantics of data provenance [Ram, Liu - 2007][Ram, Liu - 2007] – Based on the Bunge's ontology (Based on the Bunge's ontology (19771977).). – Tracks theTracks the historyhistory of theof the eventsevents affecting the status ofaffecting the status of thingsthings duringduring theirtheir lifcyclelifcycle.. – Extensible and generic, it can be used in different domains.Extensible and generic, it can be used in different domains. – 7 interrogative words:7 interrogative words: WhatWhat,, HowHow,, WhenWhen,, WhereWhere,, WhoWho,, WhichWhich,, WhyWhy.. – Not implemented in RDFS/OWL.Not implemented in RDFS/OWL. The W7 ModelThe W7 Model 6 of 23
  • 7. Digital Enterprise Research Institute www.deri.ie 1 – What1 – What AnAn eventevent (i.e. change of state) that happens to data during its life time(i.e. change of state) that happens to data during its life time In Wikipedia every type of event (In Wikipedia every type of event (creation, modification, deletioncreation, modification, deletion) leads to) leads to thethe creation of a new article revisioncreation of a new article revision.. Just using SIOC Core we can modelJust using SIOC Core we can model versioningversioning and history of wiki articles.and history of wiki articles. Our modelling solutionOur modelling solution 7 of 23 <http://example.com/action?title=Linked_Data#38010613> sioca:creates <http://en.wikipedia.org/w/index.php?title=Linked_Data&oldid=38010613>; sioca:modifies <http://en.wikipedia.org/wiki/Linked_Data>; a sioca:Action.
  • 8. Digital Enterprise Research Institute www.deri.ie • 2 – How2 – How TheThe actionaction leading to an event.leading to an event. • In Wikipedia the actions are theIn Wikipedia the actions are the editsedits applied to the articles.applied to the articles. • By analyzingBy analyzing diffsdiffs between revisions we identify thebetween revisions we identify the type of actiontype of action involvedinvolved in the creation of the newer revisionin the creation of the newer revision (( InsertionInsertion || UpdateUpdate || DeletionDeletion ) () ( SentenceSentence || ReferenceReference )) • To model the differences between revisions we created a lightweightTo model the differences between revisions we created a lightweight DiffDiff ontologyontology that aims at describingthat aims at describing changes to plain text documentschanges to plain text documents.. (http://vocab.deri.ie/diff#)(http://vocab.deri.ie/diff#) Our modelling solutionOur modelling solution 8 of 23
  • 9. Digital Enterprise Research Institute www.deri.ie 3 – When3 – When TheThe timetime an event occurs.an event occurs. • In Wikipedia every edit has a timestamp recorded, and edits areIn Wikipedia every edit has a timestamp recorded, and edits are considered instantaneous.considered instantaneous. • Use ofUse of dc:createddc:created oror event:timeevent:time Our modelling solutionOur modelling solution 9 of 23 <http://example.com/action?title=Linked_Data#380106133> dc:created "2010-08-21T06:36:17Z"; event:time [ a time:Instant; time:inXSDDateTime "2010-08-21T06:36:17Z". ]; a sioca:Action.
  • 10. Digital Enterprise Research Institute www.deri.ie 4 – Where4 – Where The onlineThe online spacespace or the location associated with an event.or the location associated with an event. In Wikipedia the information about the location of the user editing theIn Wikipedia the information about the location of the user editing the page is not provided.page is not provided. This information cannot be modelled.This information cannot be modelled. Our modelling solutionOur modelling solution 10 of 23
  • 11. Digital Enterprise Research Institute www.deri.ie Our modelling solutionOur modelling solution 11 of 23 5 – Who5 – Who AnAn agentagent involved in an event.involved in an event. In Wikipedia it is represented by theIn Wikipedia it is represented by the editoreditor of a page.of a page. We use theWe use the sioc:UserAccountsioc:UserAccount class to identify the account of the agentclass to identify the account of the agent <http://example.com/action?title=Linked_Data#36243686> sioc:has_creator <http://en.wikipedia.org/wiki/User:Timbl>; a sioca:Action.
  • 12. Digital Enterprise Research Institute www.deri.ie Our modelling solutionOur modelling solution 12 of 23 6 – Which6 – Which The programs orThe programs or instrumentsinstruments used in the event.used in the event. • In Wikipedia it is represented by the MediaWiki software used to edit theIn Wikipedia it is represented by the MediaWiki software used to edit the articles.articles. • Different in case the editor is a “bot”.Different in case the editor is a “bot”.
  • 13. Digital Enterprise Research Institute www.deri.ie Our modelling solutionOur modelling solution 13 of 23 7 – Why7 – Why TheThe reasonsreasons behind the event occurrence.behind the event occurrence. • In Wikipedia it is defined by the justifications for a change inserted by aIn Wikipedia it is defined by the justifications for a change inserted by a user in theuser in the “comment”“comment” field.field. • PropertyProperty diff:commentdiff:comment with thewith the diff:Diffdiff:Diff class as domain.class as domain.
  • 14. Digital Enterprise Research Institute www.deri.ie Our modelling solutionOur modelling solution 14 of 23
  • 15. Digital Enterprise Research Institute www.deri.ie Application using Wikipedia provenance dataApplication using Wikipedia provenance data The application is composed mainly in 3 parts:The application is composed mainly in 3 parts: • Data CollectionData Collection – Extracts and generates provenance data from Wikipedia using our model.Extracts and generates provenance data from Wikipedia using our model. • Firefox plug-inFirefox plug-in – From the provenance data collected, it computes and shows statisticalFrom the provenance data collected, it computes and shows statistical information directly on Wikipedia pages.information directly on Wikipedia pages. • Exposing the data to the Web of dataExposing the data to the Web of data – The statistical information and the provenance data are provided asThe statistical information and the provenance data are provided as Linked Open Data.Linked Open Data. 15 of 23
  • 16. Digital Enterprise Research Institute www.deri.ie Data CollectionData Collection A PHP script has been developed to extract all the articles belonging to aA PHP script has been developed to extract all the articles belonging to a categorycategory and all its subcategories, and for each article, its entireand all its subcategories, and for each article, its entire revision historyrevision history.. Then the program extracts provenance information from the articles collected atThen the program extracts provenance information from the articles collected at the previous step: it calculates thethe previous step: it calculates the diffdiff function between versions and retrievesfunction between versions and retrieves other information from the Wikipedia API.other information from the Wikipedia API. We ran our experiment with theWe ran our experiment with the “Semantic Web”“Semantic Web” category and all itscategory and all its 166166 Wikipedia articles. All the data has been loaded in a RDF store.Wikipedia articles. All the data has been loaded in a RDF store. 16 of 23
  • 17. Digital Enterprise Research Institute www.deri.ie Data CollectionData Collection 17 of 23
  • 18. Digital Enterprise Research Institute www.deri.ie A Firefox plug-inA Firefox plug-in • This application displays a table directly on top of Wikipedia articlesThis application displays a table directly on top of Wikipedia articles exposing information about the most active users and their edits.exposing information about the most active users and their edits. • It is composed by:It is composed by: – 1) The1) The triplestoretriplestore, exposing a SPARQL endpoint;, exposing a SPARQL endpoint; – 2) A2) A PHP scriptPHP script, which queries the triplestore and sends the results to, which queries the triplestore and sends the results to the Greasemonkey script;the Greasemonkey script; – 3) A3) A Greasemonkey scriptGreasemonkey script, which retrieves the URL of the Wikipedia, which retrieves the URL of the Wikipedia loaded page, sends the request to the PHP script and then displays theloaded page, sends the request to the PHP script and then displays the returned HTML data on the Wikipedia page.returned HTML data on the Wikipedia page. 18 of 23
  • 19. Digital Enterprise Research Institute www.deri.ie A Firefox plug-inA Firefox plug-in 19 of 23
  • 20. Digital Enterprise Research Institute www.deri.ie To the Web of dataTo the Web of data • The application is currently available atThe application is currently available at http://vmuss06.deri.ie/WikiProvenance/index.phphttp://vmuss06.deri.ie/WikiProvenance/index.php.. • Using this web service is possible to have RDF for the provenance dataUsing this web service is possible to have RDF for the provenance data generated with our model.generated with our model. • It is also possible to have the statistical information displayed with theIt is also possible to have the statistical information displayed with the Firefox plugin represented in RDF.Firefox plugin represented in RDF. • To represent the statistics we use SCOVO, the Statistical Core VocabularyTo represent the statistics we use SCOVO, the Statistical Core Vocabulary (http://vocab.deri.ie/scovo)(http://vocab.deri.ie/scovo) 20 of 23
  • 21. Digital Enterprise Research Institute www.deri.ie To the Web of dataTo the Web of data • As an example the following triples represent that:As an example the following triples represent that: the user “KingsleyIdehen” made 11 edits on the SIOC pagethe user “KingsleyIdehen” made 11 edits on the SIOC page 21 of 23 @prefix WikiStats: <http://vmuss06.deri.ie/WikipediaStats.owl#>. @prefix scovo: <http://purl.org/NET/scovo#>. <WikiStats:title=SIOC&user=KingsleyIdehen&edits> a scovo:Item ; rdf:value 11 ; scovo:dimension WikiStats:Edits ; scovo:dimension <http://wikipedia.org/wiki/SIOC>; scovo:dimension <http://wikipedia.org/wiki/User:KingsleyIdehen>.
  • 22. Digital Enterprise Research Institute www.deri.ie Conclusions and Future WorkConclusions and Future Work Our contributionOur contribution: • A specific lightweight ontology for provenance in wikis, based on the W7 model and SIOC.A specific lightweight ontology for provenance in wikis, based on the W7 model and SIOC. • A framework for the extraction of provenance data from Wikipedia.A framework for the extraction of provenance data from Wikipedia. • An application to access the generated data in a meaningful way and to expose it to theAn application to access the generated data in a meaningful way and to expose it to the Web of data.Web of data. Future work:Future work:  A refinement of the proposed model and anA refinement of the proposed model and an alignmentalignment with other general-purposewith other general-purpose ontologies for provenance representation.ontologies for provenance representation.  To improve theTo improve the performanceperformance and extend theand extend the featuresfeatures of the application.of the application.  To model statistics using theTo model statistics using the SDMXSDMX vocabularyvocabulary (Statistical Data and Metadata eXchange)(Statistical Data and Metadata eXchange) 22 of 23 CommentComment: • VeryVery large amount of datalarge amount of data generated for the “Semantic Web” category and its 166generated for the “Semantic Web” category and its 166 articles: almost 1.5 million triples for a total of 8.656 revisions.articles: almost 1.5 million triples for a total of 8.656 revisions.
  • 23. Digital Enterprise Research Institute www.deri.ie Applications and source code:Applications and source code: http://vmuss06.deri.ie/WikiProvenance/index.phphttp://vmuss06.deri.ie/WikiProvenance/index.php The Diff ontology:The Diff ontology: http://vocab.deri.ie/diffhttp://vocab.deri.ie/diff## Contacts:Contacts: fabrizio.orlandi@deri.orgfabrizio.orlandi@deri.org @BadmotorF@BadmotorF http://www.slideshare.net/badmotorfingerhttp://www.slideshare.net/badmotorfinger 23 of 23 Questions ?Questions ?