SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Finding the Achilles Heel of the Web of Data
using network analysis for link-recommendation
Christophe Guéret, Paul Groth, Frank van Harmelen, Stefan
Schlobach
{cgueret,pgroth,Frank.van.Harmelen,schlobac}@few.vu.nl
VU University Amsterdam
ISWC - November 11, 2010
http://latc-project.eu/
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 1 / 23
The next 25+5 minutes
The Web Of Data, Complex Systems, Robustness and road network
Contributions from the paper
I Two Complex System views of the WoD
I Application of network metrics for robustness
I Increasing robustness as an optimisation problem
Questions to be answered
I What are these Achilles Heel and where are they?
I What can we do about it?
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 2 / 23
Walking on the WoD roads
;
Credit http://www.flickr.com/photos/neuwieser/4828178404/
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 3 / 23
Resource chains and information harvesting
The Web of Data is a network of labelled ”roads”
It is possible to walk on the WoD from resource to resource
Example: find a location by de-referencing chains
Freebase DBPedia Geonames
50% of the LOD cloud data sets provide at most 2 connections to
other data sets1
1
http://lod-cloud.net/state/
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 4 / 23
What can go wrong
If a path is broken...
I some data sets become isolated
I information is lost
This can happen when...
I namespaces or concepts are changed
sioc:User → sioc:UserAccount
I servers are offline for some reason
data-center flooded, server overloaded, etc
Two different types of failure (semantic / structural)
Use network analysis tools to identify the nodes at risk and monitor
the impact of changes in topology
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 5 / 23
Robustness of the network
Robustness ∝ level of damage when a node is removed
Different measures:
I Diameter of a graph (low⇒highly connected)
I Degree distribution (scale-free⇒robust again random failure)
I Centrality (central nodes are weak spots)
I . . .
Centrality enables per node analysis
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 6 / 23
Centrality of nodes in a gprah
1
2
3
4
5
6
7
8 9 10
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 7 / 23
Centrality of nodes in a gprah
1
2
3
4
5
6
7
8 9 10
Different notions of centrality: high degree, close to other nodes, on
the way between other nodes.
I degree centrality → 4
I closeness centrality → 3 and 7
I betweenness centrality → 8
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 7 / 23
So, where are the Achilles Heel?
Credit http://www.flickr.com/photos/robbie1/1725308/
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 8 / 23
The WoD as a Complex system
The Web of Data is a multi-dimensional network with labelled edges
Need to abstract the WoD into simple networks to study it2
Networks are created using a representative subset of the WoD triples
Two networks to analyse the two types of risk
1 A structural network (nodes=hostnames)
2 A semantic network (nodes=namespaces)
2
C. Guéret, S. Wang, S. Schlobach The Web of Data is a Complex System - first insight into its multi-scale
network properties (ECCS2010)
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 9 / 23
Data sets
Take all the resource-resource triples from the BTC2010
Group them by hostnames and namespaces
BTC 2010
hostnames
namespaces
semantic network structural network
Network name Number of nodes Number of edges
Hostnames 558k 656k
Namespaces 198 936
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 10 / 23
Top 10 visited nodes - structural network
Hostname B0(n)
xmlns.com 5 693 379 049
dbpedia.org 5 432 125 038
purl.org 2 163 504 423
www.kanzaki.com 532 149 372
www.w3.org 470 113 796
dbtune.org 323 796 691
identi.ca 318 896 524
www.twine.com 299 237 555
semanticweb.org 277 374 029
dblp.l3s.de 225 602 575
If you see your machine(s) here, invest in big servers asap!
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 11 / 23
Top 10 visited nodes - semantic network
Namespace B0(n)
www.w3.org/1999/02/22-rdf-syntax-ns# 8783
example.org/ 7191
dbpedia.org/resource/ 5428
xmlns.com/foaf/0.1/ 5030
www.w3.org/2002/07/owl# 3926
sw.opencyc.org/concept/ 1764
www.w3.org/2007/uwa/context/deliverycontext.owl# 1737
www.w3.org/2003/01/geo/wgs84_pos# 1609
www.semanticdesktop.org/ontologies/2007/11/01/pimo# 1300
ontologies.ezweb.morfeo-project.org/eztag/ns# 1225
If you see your namespace(s) here, don’t change them - ever !
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 12 / 23
Top 10 visited nodes - semantic network
Namespace B0(n)
www.w3.org/1999/02/22-rdf-syntax-ns# 8783
example.org/ 7191
dbpedia.org/resource/ 5428
xmlns.com/foaf/0.1/ 5030
www.w3.org/2002/07/owl# 3926
sw.opencyc.org/concept/ 1764
www.w3.org/2007/uwa/context/deliverycontext.owl# 1737
www.w3.org/2003/01/geo/wgs84_pos# 1609
www.semanticdesktop.org/ontologies/2007/11/01/pimo# 1300
ontologies.ezweb.morfeo-project.org/eztag/ns# 1225
If you see your namespace(s) here, don’t change them - ever !
Yes, even if there is a version
number in it! (sorry Dan...)
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 12 / 23
Improving the robustness
Credit http://www.flickr.com/photos/thundershead/3713965526/
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 13 / 23
Prevent node failure
First, basic, answer: it’s easy!
Infrastructure (hostname) network
I Web of Data is based on standard Web technologies (HTTP, etc)
I It is known how to scale it: mirrors, round-robin, . . .
Semantic (namespaces) network
I Just use cool URIs, they don’t change (thus, no more problem)
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 14 / 23
Prevent node failure
First, basic, answer: it’s easy!
Infrastructure (hostname) network
I Web of Data is based on standard Web technologies (HTTP, etc)
I It is known how to scale it: mirrors, round-robin, . . .
Semantic (namespaces) network
I Just use cool URIs, they don’t change (thus, no more problem)
Second answer: find a way decrease the importance of the nodes in
the top 10
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 14 / 23
How to decrease the betweenness centrality of the nodes?
Add alternate paths to deviate the traffic when needed
Freebase DBPedia Geonames
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 15 / 23
How to decrease the betweenness centrality of the nodes?
Add alternate paths to deviate the traffic when needed
Freebase DBPedia Geonames
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 15 / 23
But adding new links...
may not be possible
I e.g. map Bio2RDF data to Geonames data
has a creation cost + a maintenance cost
I estimated as inverse of similarity between the vocabulary used by the
nodes
Optimisation problem
decrease the variance of the betweenness centrality
minimize the total cost
minimize the number of new links
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 16 / 23
Optimisation algorithms for adding links
Different strategies geared towards particular goals
Greedy strategies (exhaustive)
1 Add all the possible edges, starting with the cheapest
I Increase connectivity among topic-oriented clusters
2 Add all the possible edges, starting with the most expensive
I Bridge topic-oriented clusters
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 17 / 23
Optimisation algorithms for adding links
Different strategies geared towards particular goals
Greedy strategies (exhaustive)
1 Add all the possible edges, starting with the cheapest
I Increase connectivity among topic-oriented clusters
2 Add all the possible edges, starting with the most expensive
I Bridge topic-oriented clusters
Selective strategies (set based)
1 Add a random set of edges
I Rapid & and hopefully efficient way to create a set
2 Use a genetic algorithm to construct an optimal a set of edges
I Insert the best combination of edges
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 17 / 23
Greedy strategies - namespaces network
0
0.5
1
1.5
2
2.5
1 2 5 10 25 50 100 250 500 1000 2500 10000 25000
Centrality
ratio
Number of edges added to the graph
target
Increasing cost
Decreasing cost
(the actual centrality value is not meaningful, we report it relative to the
initial one)
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 18 / 23
Optimal set construction with the genetic algorithm
Iterative trial and error
Several sets evaluated at the same time
Improvement of candidate solutions
Create several
random sets
Evaluate and
rank them all
Alter the bests
to get new sets
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 19 / 23
Selective strategies - namespaces network
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 5 10 25 50 100 250 500 1000 2500 10000 25000
Centrality
ratio
Size of the set of edges added
target
Random choice
Evolutionary algorithm
If you want to add only few edges, select them carefully
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 20 / 23
One possible solution
From namespace To namespace Cost
http://purl.org/vocab/
lifecycle/schema#
http://rdf.freebase.com/
ns/
0.99
http://annotation.
semanticweb.org/2004/
iswc#
http://www.w3.org/2007/
uwa/context/location.
owl#
0.89
http://openean.kaufkauf.
net/id/
http://www.w3.org/2008/
05/skos-xl#
1.00
http://purl.org/dc/
dcmitype/
http://sw.opencyc.org/
concept/
1.00
This set of 4 new edges brings the centrality down to 70% of its
original value
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 21 / 23
Conclusion
What’s next?
1 Extend and generalise this work
I Analyse a different, and bigger, set of crawled data
propositions are welcome!
I Investigate other network measures
2 Increase the application range of our analysis
I Turn our batch processes into a stream-oriented analysis
I Make a service for personalised linking recommendations
Data and software available on
http://linkeddata.few.vu.nl/wod_analysis
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 22 / 23
Take home message
Network analysis provides meaningful insights
By telling which nodes are central and thus weak
The Web of Data contains weak points
Which can be identified and ranked
The Web of Data can be optimized
By choosing carefully the new connections to create
Slides available on SlideShare
http://www.slideshare.net/cgueret/cgueret-iswc2010
Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 23 / 23

Mais conteúdo relacionado

Semelhante a Finding the Achilles Heel of the Web of Data

Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceUniversity of Washington
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIAlluxio, Inc.
 
Integrating GIS utility data in the UK
Integrating GIS utility data in the UKIntegrating GIS utility data in the UK
Integrating GIS utility data in the UKAntArch
 
Presentation for slideshare
Presentation   for slidesharePresentation   for slideshare
Presentation for slidesharebolu804
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So FarEmanuele Della Valle
 
slides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdfslides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdfAndrea Morichetta
 
Graphs are at the Heart of the Cloud
Graphs are at the Heart of the CloudGraphs are at the Heart of the Cloud
Graphs are at the Heart of the CloudAlejandro Erickson
 
Presentation for use
Presentation   for usePresentation   for use
Presentation for usebolu804
 
Network Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data CentersNetwork Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data Centersrjain51
 
zenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query computezenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query computeAngelo Corsaro
 
Spring sim 2010-riley
Spring sim 2010-rileySpring sim 2010-riley
Spring sim 2010-rileySopna Sumāto
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFCommunicatieSURF
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptxTrnHiu748002
 
Information systems 365 lecture eight
Information systems 365 lecture eightInformation systems 365 lecture eight
Information systems 365 lecture eightNicholas Davis
 

Semelhante a Finding the Achilles Heel of the Web of Data (20)

WebRTC Conference and Expo (November 2013) - Signalling Workshop
WebRTC Conference and Expo (November 2013)  - Signalling WorkshopWebRTC Conference and Expo (November 2013)  - Signalling Workshop
WebRTC Conference and Expo (November 2013) - Signalling Workshop
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory Science
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
 
Integrating GIS utility data in the UK
Integrating GIS utility data in the UKIntegrating GIS utility data in the UK
Integrating GIS utility data in the UK
 
Transmission efficient ppt
Transmission efficient pptTransmission efficient ppt
Transmission efficient ppt
 
June 28 Presentation
June 28 PresentationJune 28 Presentation
June 28 Presentation
 
Presentation for slideshare
Presentation   for slidesharePresentation   for slideshare
Presentation for slideshare
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So Far
 
slides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdfslides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdf
 
Graphs are at the Heart of the Cloud
Graphs are at the Heart of the CloudGraphs are at the Heart of the Cloud
Graphs are at the Heart of the Cloud
 
Presentation for use
Presentation   for usePresentation   for use
Presentation for use
 
Network Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data CentersNetwork Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data Centers
 
zenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query computezenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query compute
 
Rock Overview
Rock OverviewRock Overview
Rock Overview
 
Spring sim 2010-riley
Spring sim 2010-rileySpring sim 2010-riley
Spring sim 2010-riley
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURF
 
SSG4Env EGU2010
SSG4Env EGU2010SSG4Env EGU2010
SSG4Env EGU2010
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptx
 
Information systems 365 lecture eight
Information systems 365 lecture eightInformation systems 365 lecture eight
Information systems 365 lecture eight
 
Distributed DASH Dataset
Distributed DASH DatasetDistributed DASH Dataset
Distributed DASH Dataset
 

Mais de Christophe Guéret

HHAI June 2022 - KGs and Hybrid Intelligence
HHAI June 2022 - KGs and Hybrid IntelligenceHHAI June 2022 - KGs and Hybrid Intelligence
HHAI June 2022 - KGs and Hybrid IntelligenceChristophe Guéret
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Christophe Guéret
 
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...Christophe Guéret
 
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"Christophe Guéret
 
The Entity Registry System (ERS)
The Entity Registry System (ERS)The Entity Registry System (ERS)
The Entity Registry System (ERS)Christophe Guéret
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !Christophe Guéret
 
Your next data viz gear should be a Wii-U
Your next data viz gear should be a Wii-UYour next data viz gear should be a Wii-U
Your next data viz gear should be a Wii-UChristophe Guéret
 
The road towards a Web-based data ecosystem
The road towards a Web-based data ecosystemThe road towards a Web-based data ecosystem
The road towards a Web-based data ecosystemChristophe Guéret
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Downscaling information systems for education
Downscaling information systems for educationDownscaling information systems for education
Downscaling information systems for educationChristophe Guéret
 
ICT4D course 2013 - Low resources infrastructure
ICT4D course 2013 - Low resources infrastructureICT4D course 2013 - Low resources infrastructure
ICT4D course 2013 - Low resources infrastructureChristophe Guéret
 
ICT4D course 2013 - OLPC deployments
ICT4D course 2013 - OLPC deploymentsICT4D course 2013 - OLPC deployments
ICT4D course 2013 - OLPC deploymentsChristophe Guéret
 
Exposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOExposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOChristophe Guéret
 
Clarifier le sens de vos données publiques avec le Web de données
Clarifier le sens de vos données publiques avec le Web de donnéesClarifier le sens de vos données publiques avec le Web de données
Clarifier le sens de vos données publiques avec le Web de donnéesChristophe Guéret
 
Embedding young learners into the information society
Embedding young learners into the information societyEmbedding young learners into the information society
Embedding young learners into the information societyChristophe Guéret
 

Mais de Christophe Guéret (20)

HHAI June 2022 - KGs and Hybrid Intelligence
HHAI June 2022 - KGs and Hybrid IntelligenceHHAI June 2022 - KGs and Hybrid Intelligence
HHAI June 2022 - KGs and Hybrid Intelligence
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...
 
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
 
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
 
The Entity Registry System (ERS)
The Entity Registry System (ERS)The Entity Registry System (ERS)
The Entity Registry System (ERS)
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
Your next data viz gear should be a Wii-U
Your next data viz gear should be a Wii-UYour next data viz gear should be a Wii-U
Your next data viz gear should be a Wii-U
 
Linking knowledge spaces
Linking knowledge spacesLinking knowledge spaces
Linking knowledge spaces
 
The data behind the HuisKluis
The data behind the HuisKluisThe data behind the HuisKluis
The data behind the HuisKluis
 
Digital archiving 3.0
Digital archiving 3.0Digital archiving 3.0
Digital archiving 3.0
 
The road towards a Web-based data ecosystem
The road towards a Web-based data ecosystemThe road towards a Web-based data ecosystem
The road towards a Web-based data ecosystem
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Downscaling information systems for education
Downscaling information systems for educationDownscaling information systems for education
Downscaling information systems for education
 
ICT4D course 2013 - Low resources infrastructure
ICT4D course 2013 - Low resources infrastructureICT4D course 2013 - Low resources infrastructure
ICT4D course 2013 - Low resources infrastructure
 
ICT4D course 2013 - OLPC deployments
ICT4D course 2013 - OLPC deploymentsICT4D course 2013 - OLPC deployments
ICT4D course 2013 - OLPC deployments
 
ICT4D course 2013 - Sugar
ICT4D course 2013 - SugarICT4D course 2013 - Sugar
ICT4D course 2013 - Sugar
 
Exposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOExposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVO
 
Clarifier le sens de vos données publiques avec le Web de données
Clarifier le sens de vos données publiques avec le Web de donnéesClarifier le sens de vos données publiques avec le Web de données
Clarifier le sens de vos données publiques avec le Web de données
 
Embedding young learners into the information society
Embedding young learners into the information societyEmbedding young learners into the information society
Embedding young learners into the information society
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Finding the Achilles Heel of the Web of Data

  • 1. Finding the Achilles Heel of the Web of Data using network analysis for link-recommendation Christophe Guéret, Paul Groth, Frank van Harmelen, Stefan Schlobach {cgueret,pgroth,Frank.van.Harmelen,schlobac}@few.vu.nl VU University Amsterdam ISWC - November 11, 2010 http://latc-project.eu/ Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 1 / 23
  • 2. The next 25+5 minutes The Web Of Data, Complex Systems, Robustness and road network Contributions from the paper I Two Complex System views of the WoD I Application of network metrics for robustness I Increasing robustness as an optimisation problem Questions to be answered I What are these Achilles Heel and where are they? I What can we do about it? Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 2 / 23
  • 3. Walking on the WoD roads ; Credit http://www.flickr.com/photos/neuwieser/4828178404/ Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 3 / 23
  • 4. Resource chains and information harvesting The Web of Data is a network of labelled ”roads” It is possible to walk on the WoD from resource to resource Example: find a location by de-referencing chains Freebase DBPedia Geonames 50% of the LOD cloud data sets provide at most 2 connections to other data sets1 1 http://lod-cloud.net/state/ Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 4 / 23
  • 5. What can go wrong If a path is broken... I some data sets become isolated I information is lost This can happen when... I namespaces or concepts are changed sioc:User → sioc:UserAccount I servers are offline for some reason data-center flooded, server overloaded, etc Two different types of failure (semantic / structural) Use network analysis tools to identify the nodes at risk and monitor the impact of changes in topology Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 5 / 23
  • 6. Robustness of the network Robustness ∝ level of damage when a node is removed Different measures: I Diameter of a graph (low⇒highly connected) I Degree distribution (scale-free⇒robust again random failure) I Centrality (central nodes are weak spots) I . . . Centrality enables per node analysis Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 6 / 23
  • 7. Centrality of nodes in a gprah 1 2 3 4 5 6 7 8 9 10 Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 7 / 23
  • 8. Centrality of nodes in a gprah 1 2 3 4 5 6 7 8 9 10 Different notions of centrality: high degree, close to other nodes, on the way between other nodes. I degree centrality → 4 I closeness centrality → 3 and 7 I betweenness centrality → 8 Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 7 / 23
  • 9. So, where are the Achilles Heel? Credit http://www.flickr.com/photos/robbie1/1725308/ Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 8 / 23
  • 10. The WoD as a Complex system The Web of Data is a multi-dimensional network with labelled edges Need to abstract the WoD into simple networks to study it2 Networks are created using a representative subset of the WoD triples Two networks to analyse the two types of risk 1 A structural network (nodes=hostnames) 2 A semantic network (nodes=namespaces) 2 C. Guéret, S. Wang, S. Schlobach The Web of Data is a Complex System - first insight into its multi-scale network properties (ECCS2010) Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 9 / 23
  • 11. Data sets Take all the resource-resource triples from the BTC2010 Group them by hostnames and namespaces BTC 2010 hostnames namespaces semantic network structural network Network name Number of nodes Number of edges Hostnames 558k 656k Namespaces 198 936 Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 10 / 23
  • 12. Top 10 visited nodes - structural network Hostname B0(n) xmlns.com 5 693 379 049 dbpedia.org 5 432 125 038 purl.org 2 163 504 423 www.kanzaki.com 532 149 372 www.w3.org 470 113 796 dbtune.org 323 796 691 identi.ca 318 896 524 www.twine.com 299 237 555 semanticweb.org 277 374 029 dblp.l3s.de 225 602 575 If you see your machine(s) here, invest in big servers asap! Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 11 / 23
  • 13. Top 10 visited nodes - semantic network Namespace B0(n) www.w3.org/1999/02/22-rdf-syntax-ns# 8783 example.org/ 7191 dbpedia.org/resource/ 5428 xmlns.com/foaf/0.1/ 5030 www.w3.org/2002/07/owl# 3926 sw.opencyc.org/concept/ 1764 www.w3.org/2007/uwa/context/deliverycontext.owl# 1737 www.w3.org/2003/01/geo/wgs84_pos# 1609 www.semanticdesktop.org/ontologies/2007/11/01/pimo# 1300 ontologies.ezweb.morfeo-project.org/eztag/ns# 1225 If you see your namespace(s) here, don’t change them - ever ! Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 12 / 23
  • 14. Top 10 visited nodes - semantic network Namespace B0(n) www.w3.org/1999/02/22-rdf-syntax-ns# 8783 example.org/ 7191 dbpedia.org/resource/ 5428 xmlns.com/foaf/0.1/ 5030 www.w3.org/2002/07/owl# 3926 sw.opencyc.org/concept/ 1764 www.w3.org/2007/uwa/context/deliverycontext.owl# 1737 www.w3.org/2003/01/geo/wgs84_pos# 1609 www.semanticdesktop.org/ontologies/2007/11/01/pimo# 1300 ontologies.ezweb.morfeo-project.org/eztag/ns# 1225 If you see your namespace(s) here, don’t change them - ever ! Yes, even if there is a version number in it! (sorry Dan...) Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 12 / 23
  • 15. Improving the robustness Credit http://www.flickr.com/photos/thundershead/3713965526/ Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 13 / 23
  • 16. Prevent node failure First, basic, answer: it’s easy! Infrastructure (hostname) network I Web of Data is based on standard Web technologies (HTTP, etc) I It is known how to scale it: mirrors, round-robin, . . . Semantic (namespaces) network I Just use cool URIs, they don’t change (thus, no more problem) Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 14 / 23
  • 17. Prevent node failure First, basic, answer: it’s easy! Infrastructure (hostname) network I Web of Data is based on standard Web technologies (HTTP, etc) I It is known how to scale it: mirrors, round-robin, . . . Semantic (namespaces) network I Just use cool URIs, they don’t change (thus, no more problem) Second answer: find a way decrease the importance of the nodes in the top 10 Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 14 / 23
  • 18. How to decrease the betweenness centrality of the nodes? Add alternate paths to deviate the traffic when needed Freebase DBPedia Geonames Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 15 / 23
  • 19. How to decrease the betweenness centrality of the nodes? Add alternate paths to deviate the traffic when needed Freebase DBPedia Geonames Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 15 / 23
  • 20. But adding new links... may not be possible I e.g. map Bio2RDF data to Geonames data has a creation cost + a maintenance cost I estimated as inverse of similarity between the vocabulary used by the nodes Optimisation problem decrease the variance of the betweenness centrality minimize the total cost minimize the number of new links Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 16 / 23
  • 21. Optimisation algorithms for adding links Different strategies geared towards particular goals Greedy strategies (exhaustive) 1 Add all the possible edges, starting with the cheapest I Increase connectivity among topic-oriented clusters 2 Add all the possible edges, starting with the most expensive I Bridge topic-oriented clusters Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 17 / 23
  • 22. Optimisation algorithms for adding links Different strategies geared towards particular goals Greedy strategies (exhaustive) 1 Add all the possible edges, starting with the cheapest I Increase connectivity among topic-oriented clusters 2 Add all the possible edges, starting with the most expensive I Bridge topic-oriented clusters Selective strategies (set based) 1 Add a random set of edges I Rapid & and hopefully efficient way to create a set 2 Use a genetic algorithm to construct an optimal a set of edges I Insert the best combination of edges Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 17 / 23
  • 23. Greedy strategies - namespaces network 0 0.5 1 1.5 2 2.5 1 2 5 10 25 50 100 250 500 1000 2500 10000 25000 Centrality ratio Number of edges added to the graph target Increasing cost Decreasing cost (the actual centrality value is not meaningful, we report it relative to the initial one) Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 18 / 23
  • 24. Optimal set construction with the genetic algorithm Iterative trial and error Several sets evaluated at the same time Improvement of candidate solutions Create several random sets Evaluate and rank them all Alter the bests to get new sets Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 19 / 23
  • 25. Selective strategies - namespaces network 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 5 10 25 50 100 250 500 1000 2500 10000 25000 Centrality ratio Size of the set of edges added target Random choice Evolutionary algorithm If you want to add only few edges, select them carefully Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 20 / 23
  • 26. One possible solution From namespace To namespace Cost http://purl.org/vocab/ lifecycle/schema# http://rdf.freebase.com/ ns/ 0.99 http://annotation. semanticweb.org/2004/ iswc# http://www.w3.org/2007/ uwa/context/location. owl# 0.89 http://openean.kaufkauf. net/id/ http://www.w3.org/2008/ 05/skos-xl# 1.00 http://purl.org/dc/ dcmitype/ http://sw.opencyc.org/ concept/ 1.00 This set of 4 new edges brings the centrality down to 70% of its original value Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 21 / 23
  • 27. Conclusion What’s next? 1 Extend and generalise this work I Analyse a different, and bigger, set of crawled data propositions are welcome! I Investigate other network measures 2 Increase the application range of our analysis I Turn our batch processes into a stream-oriented analysis I Make a service for personalised linking recommendations Data and software available on http://linkeddata.few.vu.nl/wod_analysis Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 22 / 23
  • 28. Take home message Network analysis provides meaningful insights By telling which nodes are central and thus weak The Web of Data contains weak points Which can be identified and ranked The Web of Data can be optimized By choosing carefully the new connections to create Slides available on SlideShare http://www.slideshare.net/cgueret/cgueret-iswc2010 Christophe Guéret - @cgueret (VUA) Finding the Achilles Heel of the Web of Data ISWC - November 11, 2010 23 / 23