Barangay Council for the Protection of Children (BCPC) Orientation.pptx
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
1. DSNotify - Detecting and Fixing
Broken Links in Linked Data Sets
WebS ’09 @ DEXA 2009
Linz, 02/09/2009
Bernhard Haslhofer and Niko Popitsch
Bernhard Haslhofer, Niko Popitsch
7. ...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Green Day
is an American rock trio formed in 1987. The band has consisted of Billie Joe Armstrong
(vocals, guitar), Mike Dirnt, and Tré Cool for the majority of its existence...
</dbpprop:abstract>
</rdf:Description>
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="de">Green Day
[gɹiːn deɪ] ist eine US-amerikanische Punk-Rock-Band, mit der Anfang der 1990er das Punk-
Revival begann. Die Band wurde 1987 von Billie Joe Armstrong und Mike Dirnt zusammen
mit dem Schlagzeuger John Kiffmeyer alias Al Sobrante als The Sweet Children....
</dbpprop:abstract>
</rdf:Description>
...
9. Some numbers...
• Events between DBpedia 3.2 (10/2008) and 3.3
(05/2009)
• # resources created: 29449
• # resources removed: 4789
• # resources moved: 729
Bernhard Haslhofer, Niko Popitsch 9
10.
11. Link Integrity...
• is a qualitative property that is given when all links
within and between a set of data sources are valid and
deliver the result intended by the link creator.
• cf. referential integrity in RDBMS
• demands a solution that
• detects broken links between resources
• provides support for fixing broken links
Bernhard Haslhofer, Niko Popitsch 11
12. Types of broken links
• Removed link targets
• e.g., resource deleted, server not available anymore, etc.
• Moved link targets
• available at another Web location
• e.g., reorganization of Web resources
• Modified link targets
Bernhard Haslhofer, Niko Popitsch 12
13. The DSNotify Approach
• periodically monitor items (resources) in a specific
Linked Data source
• extract descriptive features vector for each item
• store item + feature vector in index
• use feature vectors to detect if items have been
removed or moved to another location
• if moved, add relationship between “old” and “new”
item
Bernhard Haslhofer, Niko Popitsch 13
14. Architecture LOD „consuming“
application
LOD Sources
LOD Source
owl:sameAs
owl:sameAs
monitor
update
* Monitor (feature extraction)
Event
LOG
notifications
* LOD source Indices
updater
querying II RII AII
* Decider Decision making * Move Detector (heuristic)
user
DSNOTIFY
Bernhard Haslhofer, Niko Popitsch 14
15. Index Interaction
Item Index (II) Archived Item Index (AII) Removed Item Index (RII)
http://dbpedia.org/resource/
t1 Green_Day (band)
t2 http://dbpedia.org/resource/
Green_Day (band)
t3 http://dbpedia.org/resource/ http://dbpedia.org/resource/
band/Green_Day Green_Day (band)
t4 http://dbpedia.org/resource/ http://dbpedia.org/resource/
band/Alternative/Green_Day band/Green_Day
http://dbpedia.org/resource/
time Green_Day (band)
Bernhard Haslhofer, Niko Popitsch 15
16. Move Detection
• is a semi-automatic process
• calculate similarity between items based on their
feature vectors using domain-specific heuristics
• probability > given threshold: automatic decision
• probability < given threshold: ask expert user
Bernhard Haslhofer, Niko Popitsch 16
17. DSNotify HTTP Interface
• GET http://<server>:<port>/<dsnotify>/item/<uri>
• find out what happened with an item
• GET http://<server>:<port>/<dsnotify>/eventChoice
• retrieve pending event choices (move / remove)
• ...
Bernhard Haslhofer, Niko Popitsch 17
18. Evaluation Plan
t -n ... t -2 t -1 t 0
DBpedia 2.0 DBpedia 3.0 DBpedia 3.1 DBpedia 3.2
Diff Diff Diff
manual classification manual classification manual classification
mv rm mv rm mv rm
Bernhard Haslhofer, Niko Popitsch 18
19. Status / Future Work
• 1st prototype (infrastructure) ready
• annotated test-data set based on DBpedia available
• Currently working on:
• system for simulating past modifications in DBpedia
• the DSNotify evaluation
Bernhard Haslhofer, Niko Popitsch 19
22. Evaluation Plan
• Monitor simulated DBpedia evolution (t-n - t0)
• Precision / recall of automatic move detection
• with different similarity thresholds
• with different heuristics / and feature vectors
Bernhard Haslhofer, Niko Popitsch 22
23. Linked Data / Web of Data
• Data management paradigm on the basis of Web
technologies
• HTTP, URI, and RDF/S are the key technologies
• Applications (not Web browsers) are data consumers
• Links between resources play a major role
Bernhard Haslhofer, Niko Popitsch 23