1. Towards a unied framework for distributed data management
across the Semantic Web
Silvia Giannini
(Supervisor: Prof. Eugenio Di Sciascio)
Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI),
Politecnico di Bari, Bari, Italy
s.giannini@deemail.poliba.it
8th
ICCL Summer School Workshop (ICCL 2013)
Semantic Web - Ontology Languages and Their Use
Dresden, Germany | 26 August, 2013
2. The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
Motivations
State of Art
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
3. The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
4. The scenario RDF clustering Proposal Preliminary Results Conclusions
The Linking Open Data (LOD) project
A global Uniform Resource Identier for each entity on the web (URIs)
A standardized access mechanism (HTTP URIs)
A machine-readable, open and standardized data format (RDF)
A mechanism for linking dierent data sources (RDF-links)
Relationship Links
Identity Links
Vocabulary Links
Silvia Giannini RDF data clustering
5. The scenario RDF clustering Proposal Preliminary Results Conclusions
The Linking Open Data (LOD) project
As of September 2011
Music
Brainz
(zitgist)
P20
Turismo
de
Zaragoza
yovisto
Yahoo!
Geo
Planet
YAGO
World
Fact-
book
El
Viajero
Tourism
WordNet
(W3C)
WordNet
(VUA)
VIVO UF
VIVO
Indiana
VIVO
Cornell
VIAF
URI
Burner
Sussex
Reading
Lists
Plymouth
Reading
Lists
UniRef
UniProt
UMBEL
UK Post-
codes
legislation
data.gov.uk
Uberblic
UB
Mann-
heim
TWC LOGD
Twarql
transport
data.gov.
uk
Traffic
Scotland
theses.
fr
Thesau-
rus W
totl.net
Tele-
graphis
TCM
Gene
DIT
Taxon
Concept
Open
Library
(Talis)
tags2con
delicious
t4gm
info
Swedish
Open
Cultural
Heritage
Surge
Radio
Sudoc
STW
RAMEAU
SH
statistics
data.gov.
uk
St.
Andrews
Resource
Lists
ECS
South-
ampton
EPrints
SSW
Thesaur
us
Smart
Link
Slideshare
2RDF
semantic
web.org
Semantic
Tweet
Semantic
XBRL
SW
Dog
Food
Source Code
Ecosystem
Linked Data
US SEC
(rdfabout)
Sears
Scotland
Geo-
graphy
Scotland
Pupils
Exams
Scholaro-
meter
WordNet
(RKB
Explorer)
Wiki
UN/
LOCODE
Ulm
ECS
(RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-
castle
LAAS
KISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP
(RKB
Explorer)
Crime
Reports
UK
Course-
ware
CORDIS
(RKB
Explorer)
CiteSeer
Budapest
ACM
riese
Revyu
research
data.gov.
ukRen.
Energy
Genera-
tors
reference
data.gov.
uk
Recht-
spraak.
nl
RDF
ohloh
Last.FM
(rdfize)
RDF
Book
Mashup
Rådata
nå!
PSH
Product
Types
Ontology
Product
DB
PBAC
Poké-
pédia
patents
data.go
v.uk
Ox
Points
Ord-
nance
Survey
Openly
Local
Open
Library
Open
Cyc
Open
Corpo-
rates
Open
Calais
OpenEI
Open
Election
Data
Project
Open
Data
Thesau-
rus
Ontos
News
Portal
OGOLOD
Janus
AMP
Ocean
Drilling
Codices
New
York
Times
NVD
ntnusc
NTU
Resource
Lists
Norwe-
gian
MeSH
NDL
subjects
ndlna
my
Experi-
ment
Italian
Museums
medu-
cator
MARC
Codes
List
Man-
chester
Reading
Lists
Lotico
Weather
Stations
London
Gazette
LOIUS
Linked
Open
Colors
lobid
Resources
lobid
Organi-
sations
LEM
Linked
MDB
LinkedL
CCN
Linked
GeoData
LinkedCT
Linked
User
Feedback
LOV
Linked
Open
Numbers
LODE
Eurostat
(Ontology
Central)
Linked
EDGAR
(Ontology
Central)
Linked
Crunch-
base
lingvoj
Lichfield
Spen-
ding
LIBRIS
Lexvo
LCSH
DBLP
(L3S)
Linked
Sensor Data
(Kno.e.sis)
Klapp-
stuhl-
club
Good-
win
Family
National
Radio-
activity
JP
Jamendo
(DBtune)
Italian
public
schools
ISTAT
Immi-
gration
iServe
IdRef
Sudoc
NSZL
Catalog
Hellenic
PD
Hellenic
FBD
Piedmont
Accomo-
dations
GovTrack
GovWILD
Google
Art
wrapper
gnoss
GESIS
GeoWord
Net
Geo
Species
Geo
Names
Geo
Linked
Data
GEMET
GTAA
STITCH
SIDER
Project
Guten-
berg
Medi
Care
Euro-
stat
(FUB)
EURES
Drug
Bank
Disea-
some
DBLP
(FU
Berlin)
Daily
Med
CORDIS
(FUB)
Freebase
flickr
wrappr
Fishes
of Texas
Finnish
Munici-
palities
ChEMBL
FanHubz
Event
Media
EUTC
Produc-
tions
Eurostat
Europeana
EUNIS
EU
Insti-
tutions
ESD
stan-
dards
EARTh
Enipedia
Popula-
tion (En-
AKTing)
NHS
(En-
AKTing) Mortality
(En-
AKTing)
Energy
(En-
AKTing)
Crime
(En-
AKTing)
CO2
Emission
(En-
AKTing)
EEA
SISVU
educatio
n.data.g
ov.uk
ECS
South-
ampton
ECCO-
TCP
GND
Didactal
ia
DDC Deutsche
Bio-
graphie
data
dcs
Music
Brainz
(DBTune)
Magna-
tune
John
Peel
(DBTune)
Classical
(DB
Tune)
Audio
Scrobbler
(DBTune)
Last.FM
artists
(DBTune)
DB
Tropes
Portu-
guese
DBpedia
dbpedia
lite
Greek
DBpedia
DBpedia
data-
open-
ac-uk
SMC
Journals
Pokedex
Airports
NASA
(Data
Incu-
bator)
Music
Brainz
(Data
Incubator)
Moseley
Folk
Metoffice
Weather
Forecasts
Discogs
(Data
Incubator)
Climbing
data.gov.uk
intervals
Data
Gov.ie
data
bnf.fr
Cornetto
reegle
Chronic-
ling
America
Chem2
Bio2RDF
Calames
business
data.gov.
uk
Bricklink
Brazilian
Poli-
ticians
BNB
UniSTS
UniPath
way
UniParc
Taxono
my
UniProt
(Bio2RDF)
SGD
Reactome
PubMed
Pub
Chem
PRO-
SITE
ProDom
Pfam
PDB
OMIM
MGI
KEGG
Reaction
KEGG
Pathway
KEGG
Glycan
KEGG
Enzyme
KEGG
Drug
KEGG
Com-
pound
InterPro
Homolo
Gene
HGNC
Gene
Ontology
GeneID
Affy-
metrix
bible
ontology
BibBase
FTS
BBC
Wildlife
Finder
BBC
Program
mes BBC
Music
Alpine
Ski
Austria
LOCAH
Amster-
dam
Museum
AGROV
OC
AEMET
US Census
(rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Silvia Giannini RDF data clustering
6. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF: the big picture
DBpedia1
extract
dbpedia:Dresden
dbpedia-owl:country
328.8
dbpedia-owl:areaTotal
dbpedia:Germany
Graph-structured knowledge representation (data-model)
Resource: concrete or abstract entity of the real world, identied by
dereferenceable URI
Description: representation of properties or relationships among resources
Framework: combination of web based protocols and formal semantics
Facts in Triple-form: subject - predicate - object
http://dbpedia.org/resource/Dresden http://dbpedia.org/property/country
http://dbpedia.org/resource/Germany.
1http://dbpedia.org
Silvia Giannini RDF data clustering
7. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF: the big picture
DBpedia extract
dbpedia:Dresden
dbpedia-owl:country
328.8
dbpedia-owl:areaTotal
rdf:type rdf:type
rdf:type
rdfs:rangerdfs:domain
dbpedia-owl:country
RDF data model
RDF Schema
dbpedia:Germany
dbpedia-owl:PopulatedPlace dbpedia-owl:Country
owl:ObjectProperty
RDF Schema: Explicit semantics of content and links
Silvia Giannini RDF data clustering
8. The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
Motivations
State of Art
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
9. The scenario RDF clustering Proposal Preliminary Results Conclusions
Motivations
RDF Data Management Challenges
LOD cloud statistic: 31 billions facts, 500 million links, at October 2011
How to eciently:
Develop services on the top of the RDF data-model for
browsing data;
query answering;
supporting expressive search (approximate matching);
Speed up data access and query response times over distributed machines
CLUSTERING
Silvia Giannini RDF data clustering
10. The scenario RDF clustering Proposal Preliminary Results Conclusions
Motivations
Contributions
Clustering semantic web resources (RDF graphs)
Discovering homogeneous groups of resources
Summarizing the original graph content in a meaningful way
Revealing possible hierachies of clusters
Identing a concept description or discriminating features for each cluster
Silvia Giannini RDF data clustering
11. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
12. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
13. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
14. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
RDF data-model not suited for traditional data-clustering techniques
application over real-life RDF datasets!
Silvia Giannini RDF data clustering
15. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: graph-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Graph clustering methods
vertex connectivity
neighborhood similarity
spectral analysis of the adjacency matrix
- Number or size of clusters to be set
http://sydney.edu.au/engineering/it/~shhong/img/cluster1.png
Silvia Giannini RDF data clustering
16. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extraction
Subgraph relevant for a resource representation (DESCRIBE SPARQL2
-query)
1 Immediate Properties
+ simple, quick
- loss of information
2 Concise Bounded Description (CBD)
+ better body of knowledge
- domain dependent (use of blank
nodes)
3 Depth Limited Crawling
+ stable over input data with well
limiting subgraph
- nd a tradeo between size and
information content (data
dependent)
G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
17. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extraction
Subgraph relevant for a resource representation (DESCRIBE SPARQL2
-query)
1 Immediate Properties
+ simple, quick
- loss of information
2 Concise Bounded Description (CBD)
+ better body of knowledge
- domain dependent (use of blank
nodes)
3 Depth Limited Crawling
+ stable over input data with well
limiting subgraph
- nd a tradeo between size and
information content (data
dependent)
G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
18. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extraction
Subgraph relevant for a resource representation (DESCRIBE SPARQL2
-query)
1 Immediate Properties
+ simple, quick
- loss of information
2 Concise Bounded Description (CBD)
+ better body of knowledge
- domain dependent (use of blank
nodes)
3 Depth Limited Crawling
+ stable over input data with well
limiting subgraph
- nd a tradeo between size and
information content (data
dependent)
G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
19. The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instances distance computation
Comparing two RDF graphs with the resources as root nodes
1 feature-vector based
mappings: (feature → shortest path; value → set of reachable nodes)
similarity measure: e.g., Dice coecient
2 graph based
conceptual similarity: overlapping of nodes
relational similarity: overlapping of edges
3 ontology based3 (well dened ontology and conforming instance data)
taxonomy similarity: semantic distance between metadata in a concept
hierarchy
relation similarity: similarity of the instances related to the two considered
resources
attribute similarity: similarity of attribute values (numeric, literal, etc.)
Determine the appropriate number of clusters
3A. Maedche, and V. Zacharias. Clustering ontology-based metadata in the semantic
web. Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2002.
348-360.
Silvia Giannini RDF data clustering
20. The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
21. The scenario RDF clustering Proposal Preliminary Results Conclusions
Requirements
Ideal clustering of graph-structured data:
cohesive intra-cluster structure
homogeneous intra-cluster properties
Parameter free algorithm:
number and size of partitions extracted from data
Silvia Giannini RDF data clustering
22. The scenario RDF clustering Proposal Preliminary Results Conclusions
How does community detection algorithms behave over RDF(S) graphs?
Community Discovery Algorithms
Graph mining techniques for extracting knowledge from large graphs
Exploit native graph features (topology ) of the RDF model
Why:
If two sets of entities are strongly related, they exhibit more connections
than other sets of entities
Benets:
+ Automatically discover the number and size of modules
+ Can handle uncertainty in clustering (overlapping communities)
+ Faster than data-clustering inspired techniques (no instances extraction)
Silvia Giannini RDF data clustering
23. The scenario RDF clustering Proposal Preliminary Results Conclusions
What is a community
A subgraph of a network whose nodes are more tightly connected with each
other than with nodes outside the subgraph.
Similarity : cohesion degree of subsets of vertices
- No overlapping capabilities
C = {C1, . . . , Cn}, Ci ∩ Cj = ∅ ∀i, j ∈ {1, . . . , n}, i = j
In labeled graphs (like RDF graphs), each link models only one specic relation
Overlapping Communities Analysis
Silvia Giannini RDF data clustering
24. The scenario RDF clustering Proposal Preliminary Results Conclusions
From Node to Link Perspective
Community : A set of nodes with more external than internal connections, i.e.,
a set of closely interrelated links.
Benets:
+ Captures multiple memberships between nodes
+ Unies hierarchical and overlapping clustering
It is always possible to move from a link partition P = {P1, . . . , Pm},
Pi ∩ Pj = ∅ ∀i, j ∈ {1, . . . , m}, i = j to m nodes clusters, with possible
overlapping.
Silvia Giannini RDF data clustering
25. The scenario RDF clustering Proposal Preliminary Results Conclusions
Datasets
SP2
Bench4
: A SPARQL Performance Benchmark
data generator for arbitrarily large DBLP-like RDF documents creation
mirrors key characteristics and social-world distributions of original DBLP
dataset
publicy available
4M. Schmidt, et al. SP2Bench: SPARQL performance benchmark. Semantic Web
Information Management. Springer Berlin Heidelberg, 2010. 371-393.
Silvia Giannini RDF data clustering
26. The scenario RDF clustering Proposal Preliminary Results Conclusions
Node communities
SP2
Bench: 720 triples
Paul_ErdoesPaul_Erdoes
ArticleArticle
PersonPerson
ArticleArticle
Paul_ErdoesPaul_Erdoes
PersonPerson
V.D. Blondel, et al. Fast unfolding of communities
in large networks. Journal of Statistical Mechanics:
Theory and Experiment 2008.10 (2008): P10008.
Tool: Gephi (https://gephi.org)
Silvia Giannini RDF data clustering
27. The scenario RDF clustering Proposal Preliminary Results Conclusions
Link Communities
Given an undirected graph G = (V, E), the set of neighbors of node i is
Ni = {j ∈ V|eij ∈ E}.
Similarity
5
: S(eik, ejk) =
|Ni∩Nj |
|Ni∪Nj |
Link Dendrogram: hierarchical agglomerative algorithm
Optimization of Partition density : cut level optimizes link density inside
communities
DP = 2
M c mc
mc−(nc−1)
(nc−2)(nc−1)
,
5Y.Y. Ahn, J.P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity
in networks. Nature 466.7307 (2010): 761-764.
Silvia Giannini RDF data clustering
28. The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
29. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering6
Article1
_:x1
dc:creator
Adamanta Schlitt
foaf:name
dc:title
richer dwelling
scrapped
swrc:pages
140
_:x1
_:x2
_:x3
foaf:Person
rdf:type
rdf:type
rdf:type
rdf:type
rdf:type
swrc:journal
swrc:journal
rdf:type
rdf:type
swrc:journal
dc:creator
dc:creator
dc:creator
SIGNATURE: subject SIGNATURE: (predicate, object) SIGNATURE: {(predicate_1, object_1), ... (predicate_n, object_n)}
Different background colours reveal the hierarchy of clusters
REPLICATED NODES REVEALING OVERLAPPING CLUSTERS
LINKS BELONGING TO OTHER CLUSTERS
rdf:type
Article20
Article13
Paul_Erdoes
swrc:journal
swrc:journal
Article3
Article2
Article1
Journal1
bench:Article
TYPE 1. CLUSTER (a) TYPE 2. CLUSTER (b) TYPE 3. CLUSTER (c)
6S. Giannini, RDF Data Clustering. Springer Berlin Heidelberg, 2013. BIS 2013
Workshop, LNBIP 160: 220231.
Silvia Giannini RDF data clustering
30. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
Cluster of type 2.
Aggregation of resources (xed predicate - xed object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
31. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
ex:Article15 swrc:pages 139
ex:Article15 dc:title equalled bewitchment cheaters
ex:Article15 dc:creator ex:node17r3ptqpmx16
ex:Article15 rdfs:seeAlso http://www.skeins.tld/sandwiching/bewitchment.html
ex:Article15 foaf:homepage http://www.sandwiching.tld/cheaters/ried.html
Cluster of type 2.
Aggregation of resources (predicate - object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
32. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
Cluster of type 2.
Aggregation of resources (xed predicate - xed object)
ex:Article9 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article7 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article10 swrc:journal http://localhost/publications/journals/Journal1/1945
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
33. The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
Cluster of type 2.
Aggregation of resources (xed predicate - xed object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
ex:Article8 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article8 rdf:type http://localhost/vocabulary/bench/Article
ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article5 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article5 rdf:type http://localhost/vocabulary/bench/Article
ex:Article5 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article4 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article4 rdf:type http://localhost/vocabulary/bench/Article
ex:Article4 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article3 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article3 rdf:type http://localhost/vocabulary/bench/Article
ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article2 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article2 rdf:type http://localhost/vocabulary/bench/Article
ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article1 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article1 rdf:type http://localhost/vocabulary/bench/Article
ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1942
Silvia Giannini RDF data clustering
34. The scenario RDF clustering Proposal Preliminary Results Conclusions
Advantages and Emerging issues
Tests over 266, 720, and 5362 triples datasets
Number of obtained clusters: 53, 277, 3437
+ Good behaviour in presence of blank nodes
http://localhost/vocabulary/bench/PhDThesis rdfs:subClassOf foaf:Document
http://localhost/vocabulary/bench/Www rdfs:subClassOf foaf:Document
http://localhost/vocabulary/bench/Book rdfs:subClassOf foaf:Document
_:node17rocfnblx296 rdf:_3 misc:UnknownDocument_c
_:node17rocfnblx296 rdf:_2 misc:UnknownDocument_b
_:node17rocfnblx296 rdf:_1 misc:UnknownDocument_a
misc:UnknownDocument_c rdf:type foaf:Document
misc:UnknownDocument_b rdf:type foaf:Document
misc:UnknownDocument_a rdf:type foaf:Document
http://localhost/vocabulary/bench/MastersThesis rdfs:subClassOf foaf:Document
- A post-processing phase is needed (links replication)
If Paul Erdoes is a Person included in a type 2. cluster with signature (rdf:type -
prex:Person), this property will not appear in the cluster of type 1. describing the
resource Paul_Erdoes
Silvia Giannini RDF data clustering
35. The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
36. The scenario RDF clustering Proposal Preliminary Results Conclusions
Conclusions and Future Works
Community detection algorithms are a promising candidate for:
semantic web resources clustering
instances extraction from RDF graphs
Ongoing and future works:
A more comprehensive experimental evaluation on dierent datasets
Analysis of cut threshold
Better denition of post-processing phase
Comparison with existing approaches
Combination of (1) graph clustering techniques, and (2) reasoning services
1 Identify communities of closely related resources
2 Extract a semantic description of them
Experimentation of property-driven clustering
Dynamics and evolution of clusters
Silvia Giannini RDF data clustering