When generating a lot of WoD links automatically, data quality is a pressing issue. This presentation, and the related paper, introduce LinkQA: a network based node-centric framework to analyse the impact of linkage on the network topology and assess the quality of these links.
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Assessing Impact of Links in Linked Data Using Network Measures
1. Assessing Linked Data Mappings using
Network Measures
Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann
9th Extended Semantic Web Conference (ESWC)
May 29, 2012
http://latc-project.eu
ESWC - May 2012 http://aksw.org
Assessing Linked Data mappings http://www.vu.nl 1/25
2. The next 25+5 minutes
The impact of links in the Web of Data
Main questions
What is the impact of link creation?
Can we detect “bad” links based on their impact?
Is adding links always a good thing?
Contributions
A framework to assess the impact of links
Results for 5 metrics
ESWC - May 2012 Assessing Linked Data mappings 2/25
3. Is this a good or a bad link ?
ESWC - May 2012 Assessing Linked Data mappings 3/25
4. Measuring the Web of Data
Look at the topology using network analysis tools
Impossible to get the complete graph
Sampling of the graph focusing on specific nodes
See the bigger picture through aggregation
Build the local network around a resource
Repeat the process a sufficient number of time
ESWC - May 2012 Assessing Linked Data mappings 4/25
5. Network sampling process
Use SPARQL end point or de-reference the
resources to get the descriptions
ESWC - May 2012 Assessing Linked Data mappings 5/25
6. Aggregation of local results
Observed
Target
…
ESWC - May 2012 Assessing Linked Data mappings 6/25
7. Metrics
Compute local scores for a resource
Criteria
Use only the local network
Representative of a global property
Not sensitive to change of observation scale
5 metrics currently available in LinkQA
ESWC - May 2012 Assessing Linked Data mappings 7/25
8. What do we want to see?
Increase of connectivity within topical groups
Increase chances of finding related information
More bridges between topical groups
Improve browsing capabilities
More connectivity around hubs
Decrease the dependency upon the hubs
ESWC - May 2012 Assessing Linked Data mappings 8/25
9. Metric 1 – Degree
Metric
Number of edges
around the target node
Target
Power-law distribution
of values
Intuition
Presence of hubs
ESWC - May 2012 Assessing Linked Data mappings 9/25
10. Metric 2 – Clustering coefficient
Metric
Density of links around
the target node
Target
Increase clustering
around nodes
Intuition
Topical clusters
ESWC - May 2012 Assessing Linked Data mappings 10/25
11. Metric 3 – Centrality
Metric
Ratio between outgoing
and incoming links
Target
Lower the discrepancy
between the values
Intuition
Hubs are sensitive
ESWC - May 2012 Assessing Linked Data mappings 11/25
12. Metric 4 – SameAs chains
Metric
Number of “open”
sameAs chains
Target
No open sameAs
Intuition
Peer agreement
ESWC - May 2012 Assessing Linked Data mappings 12/25
13. Metric 5 – Description enrichment
Metric
Richness of resource
description
Target
Increase as possible
Intuition
“SameAsed” resources
are complementary
ESWC - May 2012 Assessing Linked Data mappings 13/25
14. Under the hood of LinkQA
ESWC - May 2012 Assessing Linked Data mappings 14/25
http://www.flickr.com/photos/cradlehall/5747161514
15. Workflow of an analysis
ESWC - May 2012 Assessing Linked Data mappings 15/25
16. Output of an analysis
Results on the node and aggregated scale
Per metric:
Indication of change with respect to the target
Sorted list of outlier nodes, sorted by their distance to
the target
Plus, a global ranking of nodes
=> Input for manual inspection by an expert
ESWC - May 2012 Assessing Linked Data mappings 16/25
18. Global impact of links
Observe the distributions to detect bad links
ESWC - May 2012 Assessing Linked Data mappings 18/25
19. First evaluation
160 linking specifications for Silk, developed in
the context of LATC
6 linking specifications with manual verification of
results
50 positive links
50 negative links
Execute LinkQA with 10 samples of 50 links
ESWC - May 2012 Assessing Linked Data mappings 19/25
20. Results of the detection
“C” if change detected in > 50% of runs
ESWC - May 2012 Assessing Linked Data mappings 20/25
21. Some explanations
Low sensitivity of metrics:
Lack of data
Stable change
50/50 accuracy of detection:
Targets may not be the right ones
Sample may not be big enough
Semantics agnostic measures are less performant
ESWC - May 2012 Assessing Linked Data mappings 21/25
22. A closer look at the outliers
See if the outliers are necessarily bad links
ESWC - May 2012 Assessing Linked Data mappings 22/25
23. Second evaluation
Linking specifications for Silk, developed in the
context of LATC
All linking specifications sampled to have
45 positive links
5 negative links
Execute LinkQA five time, on five samples
ESWC - May 2012 Assessing Linked Data mappings 23/25
24. Rank of positive and negative links
ESWC - May 2012 Assessing Linked Data mappings 24/25
25. Take home message
LinkQA is a node centric approach to measure the
impact of links in the WoD network
Scalable, can be distributed
Current results show that
The 5 metrics defines are to be improved
Metrics considering Semantics perform better
The network sample seems too small
Outliers detection improves with the number of metrics
ESWC - May 2012 Assessing Linked Data mappings 25/25