Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Â
Wi2015 - Clustering of Linked Open Data - the LODeX tool
1. DBGroup@UNIMO
Fabio Benedetti, Sonia Bergamaschi, Laura Po
Department of Engineering âEnzo Ferrariâ
University of Modena & Reggio Emilia
The 2015 IEEE/WIC/ACM International Conference on Web Intelligence
2. DBGroup@UNIMO
3Laura Po âExposing the underlying schema of LOD sourcesâ 3
â publish data on the Web under an open license
â â make data available as structured data
â â â make data available in a non-proprietary open format
â â â â â link your data to other data to provide context
â â â â use URIs to denote things
â â â â â L document your data
in a top-down fashion
In 2006, Tim Berners-Lee coined the term "Linked Dataâ
3. DBGroup@UNIMO
4Laura Po âExposing the underlying schema of LOD sourcesâ 4
The LOD Cloud
âą more then one
thousand of interlinked
datasets
âą several billions of RDF
triples
Each LOD source
âą widely varying size,
from thousands to
billions of triples
4. DBGroup@UNIMO
5Laura Po âExposing the underlying schema of LOD sourcesâ 5
A tool for promoting the understanding, navigation
and querying of LOD sources
Requirements
âą portable to the LOD Cloud
âą provide a synthetic representation of the structure of
the dataset (Schema Summary, Clustered Schema Summary)
âą provide visual query building functionalities hiding
the complexity of Semantic Web technologies
8. DBGroup@UNIMO
11Laura Po âExposing the underlying schema of LOD sourcesâ 11
âą A tool for exploring and querying LOD sources
+ navigation of large LOD sources
Try LODeX at: http://dbgroup.unimo.it/lodex2
http://www.dbgroup.unimo.it/lodex2/testCluster
Future works
âą New filtering and clustering techniques
âą An interactive exploration than start from the highest
level and can be detailed till the lowest level
âą Query functionalities on the Clustered Schema Summary
(mapping functionalities to convert a visual query on the
CSS to a SPARQL query on the LOD endpoint)
10. DBGroup@UNIMO
13Laura Po âExposing the underlying schema of LOD sourcesâ 13
âą F. Benedetti, S. Bergamaschi, L. Po, Exposing the underlying
schema of LOD sources. WI 2015
âą F. Benedetti, S. Bergamaschi, L. Po, LODeX: A tool for Visual
Querying Linked Open Data. ISWC 2015 (Posters &
Demonstrations Track)
âą F. Benedetti, S. Bergamaschi, L. Po, Visual Querying LOD sources
with LODeX. K-CAP 2015
âą F. Benedetti, S. Bergamaschi, and L. Po, A visual summary for
linked open data sources. ISWC 2014 (Posters & Demonstrations
Track)
âą F. Benedetti, S. Bergamaschi, and L. Po. Online index extraction
from linked open data sources. Linked Data for Information
Extraction (LD4IE) Workshop held at ISWC 2014
12. DBGroup@UNIMO
15Laura Po âExposing the underlying schema of LOD sourcesâ 15
âą Each RDF graph is composed by a set of vertices V and a set of labelled
edges E. The vertices can be divided in 3 disjoint sets: the URIs U, the blank
nodes B and literals L.
âą Two vertices connected by an edge represent a statement. Each
statement is stored into a <subject,predicate,object> triple, where
subject ï (U ï B) , object ï V and predicate ï E.
âą We can define the whole RDF graph as a set of triples RG.
RG ï (U ï B) x E x V
âą The rdf:type property is used to state that a certain resource is an instance
of a class. We define the set of classes as Cs.
Cs = {c |<i,rdf:type,c> ï RG ^ i ï (U ï B) }
âą We call partial cluster of classes (PC) a set of classes that concur in the
multiple instantiation of the same resource:
PC(i) = {c|<i,rdf:type,c> ï RG ^ i ï (U ï B) }
âą and each PC(i) ï C
13. DBGroup@UNIMO
16Laura Po âExposing the underlying schema of LOD sourcesâ 16
âą The partial cluster of classes (PC) are sets of classes that concur in the
multiple instantiation of the same resource:
PC(i) = {c|<i,rdf:type,c> ï RG ^ i ï (U ï B) }
âą By examining all the instances in a RG graph, we find different PC.
âą The collection of all the PC that occur in a RG graph is called family of
PC, C :
C = {PC(i): ïąi ï (U ï B)}
âą C contains a particular family of sets able to generate all the other sets.
We call this family, family of super sets (S2), and we define it as follow:
S = {ST ï C: ï€PC ï C ^ PC ï ST}
âą For each set st ï S , a class ca ï st must be elected to represent the
entire set of classes. This class is called candidate agent of the superset.
For each superset, we choose as candidate agent the class with the
highest number of instances.
14. DBGroup@UNIMO
17Laura Po âExposing the underlying schema of LOD sourcesâ 17
The Schema Summary is a pseudograph composed by:
âą C - Classes (nodes)
âą P - Properties (edges)
And additional elements and function:
âą A - Attributes associated to each class
â Each attribute represent the existence of a Datatype property
from the instances of the class
âą đ - labels
âą l â labeling function
âą count - count function
The Schema Summary is inferred by the distribution of
the instances of a dataset
15. DBGroup@UNIMO
18Laura Po âExposing the underlying schema of LOD sourcesâ 18
These indexes belong to extensional group of the Statistical Indexes [2]:
âą SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
âą SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
âą OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
âEnergyâ organization2
Extensional
Classes
Extensional
Knowledge
âVillage electrification
in the Pacificâ
â+41331231â
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
âPaoloâ
âRossiâ
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
16. DBGroup@UNIMO
19Laura Po âExposing the underlying schema of LOD sourcesâ 19
These indexes belong to extensional group of the Statistical Indexes [2]:
âą SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
âą SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
âą OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
âEnergyâ organization2
Extensional
Classes
Extensional
Knowledge
âVillage electrification
in the Pacificâ
â+41331231â
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
âPaoloâ
âRossiâ
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
17. DBGroup@UNIMO
20Laura Po âExposing the underlying schema of LOD sourcesâ 20
These indexes belong to extensional group of the Statistical Indexes [2]:
âą SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
âą SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
âą OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
âEnergyâ organization2
Extensional
Classes
Extensional
Knowledge
âVillage electrification
in the Pacificâ
â+41331231â
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
âPaoloâ
âRossiâ
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
18. DBGroup@UNIMO
21Laura Po âExposing the underlying schema of LOD sourcesâ 21
We use an algorithm for combining these indexes and produce a Schema
Summary
Name Values
SC
(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1),
(foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1),
(ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1),
(foaf:Organization,dbpedia:fax,1)
OC
(ex:Sector,ex:sector,1)
(ex:Person,ex:ceo,1)
19. DBGroup@UNIMO
22Laura Po âExposing the underlying schema of LOD sourcesâ 22
foaf:Organizzation
2
ex:Sector
1
ex:sector 2foaf:Person
1
ex:ceo 1
dc:title 1foaf:firstName 1
foaf:lastName 1
ex:activity 1
dbpedia:fax 1
We use an algorithm for combining these indexes and produce a Schema
Summary
Name Values
SC
(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1),
(foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1),
(ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1),
(foaf:Organization,dbpedia:fax,1)
OC
(ex:Sector,ex:sector,1)
(ex:Person,ex:ceo,1)
20. DBGroup@UNIMO
23Laura Po âExposing the underlying schema of LOD sourcesâ 23
Two main modules
âą Extraction & Summarization
â Index Extraction (IE)
â Post Processing (PP)
LOD Cloud
SPARQL
Queries
LODeX
Post-
processing
Statistical
Indexes
LODeX
Indexes
Extraction
Endpoint
URLs
Schema
Summary
NoSQL
SPARQL
Queries
Schema
Summary
Query
Orchestrator
Schema
Summary
Visualizzation
Basic
QueryResults
âą Visualization & Querying
â Schema Summary Visualization
â Query Orchestrator
21. DBGroup@UNIMO
24Laura Po âExposing the underlying schema of LOD sourcesâ 24
Schema Summary Visualization
Front end of the Web Application composed by three panel:
âą List of datasets indexed in LODeX
âą Schema Summary and query building panel
âą Refinement panel
Query Orchestrator
âą It manages the interaction between the User and the GUI
âą It contains a SPARQL compiler able to compile the visual
query in a SPARQL one