This thesis proposes to help analyzing the characteristics of the heterogeneous social networks that emerge from the use of web-based social applications, with an original contribution that leverages Social Network Analysis with Semantic Web frameworks. Social Network Analysis (SNA) proposes graph algorithms to characterize the structure of a social network and its strategic positions. Semantic Web frameworks allow representing and exchanging knowledge across web applications with a rich typed graph model (RDF), a query language (SPARQL) and schema definition frameworks (RDFS and OWL). In this thesis, we merge both models in order to go beyond the mining of the flat link structure of social graphs by integrating a semantic processing of the network typing and the emerging knowledge of online activities. In particular we investigate how (1) to bring online social data to ontology-based representations, (2) to conduct a social network analysis that takes advantage of the rich semantics of such representations, and (3) to semantically detect and label communities of online social networks and social tagging activities.
1. Guillaume Erétéo
SEMANTIC SOCIAL
NETWORK ANALYSIS
Ph.D. Thesis defense
supervisors:
Michel Buffa, Kewi/I3S, UNSA/CNRS
Fabien Gandon, Edelweiss, INRIA Sophia Antipolis
Patrick Grohan, Orange Labs
2. 1. Context and Scientific Objectives
2. State of the Art on Social Network Analysis & Semantic Social
Networks
3. SemSNA: Analysing Social Networks with Semantic Web
Frameworks
4. Community Detection: SemTagP, Semantic Tag Propagation
OUTLINE
2
3. CONTEXT
ISICIL: Information Semantic Integration
through Communities of Intelligence onLine
enterprise 2.0
semantic web
business intelligence
pluridisciplinary: ergonomists, sociologists,
mathematicians, ontologists, computer scientists
ANR-08-CORD-011
3
4. SEMANTIC INTRANET OF PEOPLE
"the use of emergent social software represent, exchange and analyse
platforms within companies, or data accross applications to deliver
between companies and their information in a way that matters to
partners or customers" people and to their communities.
[McAfee 2006] [Berners-Lee et al., 2001]
4
5. SCIENTIFIC OBJECTIVES
extend social network analysis with semantic formalisms
to reveal and exploit the rich social structures embedded in the
emerging social data of web 2.0 applications:
how to represent, link and access online social networks
accross applications?
how to enable classical operators of social network
analysis to consider the semantics of these networks?
how this semantics could be exploited to create new
algorithms?
5
6. 1. Context and Scientific Objectives
2. State of the Art on Social Network Analysis & Semantic
Social Networks
3. SemSNA: Analysing Social Networks with Semantic Web
Frameworks
4. Community Detection: SemTagP, Semantic Tag Propagation
OUTLINE
6
7. SOCIAL NETWORK ANALYSIS
graph algorithms to characterize the structure of a social network,
strategic positions/actors, and the distribution of networking
activities.
applications:
monitor information flow
foster communication
focus notifications in information systems
create project teams
identify experts
7
8. SOCIAL NETWORKS AND GRAPHS
actors are represented by nodes and relations by edges
G=(V, E), n=|V|, m=|E|
collaborate sameInterest
3 1
follows
4
0.5 colleague follows
1,5
manages manages
1 2
8
9. NETWORK STRUCTURE
e.g. density and diameter highlight cohesion of the network
[Scott 2000]
diam(G) = length(g(e1,e2 ));
∀e3 ,e4 ∈ E G ;
length(g(e3 ,e4 )) ≤ length(g(e1,e2 ))
€
[Zachary 1977]
9
15. ONLINE SOCIAL DATA ARE MORE
COMPLEX TO REPRESENT
multiple & spread roles, context, profile, etc. distributed
across applications
15
16. LINK STRUCTURE IS NOT ENOUGH
who has the best betweeness centrality?
has met
knows in
passing
has met
works With
has supervisor
works With
16
17. SEMANTICS MATTER!
how can we consider different types of relations?
has met
knows in
passing
has met
works with
has supervisor
works with
17
18. RESOURCE DESCRIPTION FRAMEWORK
make assertions and describe resources with triples (subject, predicate, object)
like "the subject, verb and object of an elementary sentence“ [Berners-Lee 2001]
18
19. ONTOLOGY
"a set of representational primitives with which to model a
domain of knowledge or discourse.
The representational primitives are typically classes (or sets),
attributes (or properties), and relationships (or relations
among class members).
The definitions of the representational primitives include
information about their meaning and constraints on their
logically consistent application”
[Gruber 1993] [Gruber 2009]
19
20. RESOURCE DESCRIPTION FRAMEWORK SCHEMA
set of primitives to define the classes of a domain knowledge,
taxonomical relations, and classes of resource that apply to properties
20
21. SPARQL PROTOCOL AND RDF QUERY LANGUAGE
query language, protocol and format to send queries and
exchange results across the web
PREFIX foaf: < http://xmlns.com/foaf/0.1/>
SELECT ?person ?name WHERE {
?person rdf:type foaf:Agent
?person foaf:firstName ?name
}
21
23. CLASSIC SNA ON SEMANTIC WEB
rich graph representations reduced to simple un-typed graphs for analysis
[Paolillo & Wright 2006]
[San Martin & Gutierrez 2009]
foaf:knows
foaf:interest
23
24. Fabien
Gérard
Mylène
coworker
d (guillaume)=5
Michel
Yvonne
24
25. Fabien
Gérard
Mylène
coworker
d <family> (guillaume)=?
Michel
Yvonne
25
26. Fabien
Gérard
Mylène
oworker
knows
c
colleague d <family> (guillaume)=3
Michel
sibling parent Yvonne
sister brother father mother 26
28. 1. Context and Scientific Objectives
2. State of the Art on Social Network Analysis & Semantic Social
Networks
3. SemSNA: Analysing Social Networks with Semantic Web
Frameworks
4. Community Detection: SemTagP, Semantic Tag Propagation
OUTLINE
28
29. SEMANTIC SNA FRAMEWORK
exploit the semantic of social networks and parameterize SNA operators
parameterized SNA operators
SPARQL formalization of operators
SemSNA ontology: annotate social data with results of analyses
29
30. PARAMETERIZED DENSITY
proportion of the maximum possible number of properties of type
<rel> (or subtype)
number of actors of a given type (or subtype)
number of pairs of resources linked by a
property of type <rel> (or subtype)
30
31. PARAMETERIZED N-DEGREE
number of paths of properties of type <rel> (or subtype) having y at one end
and with a length smaller or equal to dist
parameterized path: a list of
nodes of a graph G each linked
to the next by a relation of type
<rel> (or subtype)
31
32. PARAMETRIZED DIAMETER
length of the longest geodesic in the network for a property of type
<rel> (or subtype)
geodesic: a shortest path
between two resources for a given
relation of type <rel> (or subtype)
32
33. SPARQL FORMALIZATION OF
PARAMETERIZED OPERATORS
SPARQL is designed to query RDF data
CORESE semantic search engine implementing semantic
web languages using graph-based representations
Automatic processing of semantic inference (e.g.
subsumption)
Graph querying extension (e.g. paths)
[Corby et al 2004] [Corby 2008]
33
34. SPARQL FORMALIZATION
parameterized density
SELECT cardinality(?p) as ?card WHERE {
{ ?p rdf:type rdf:Property
filter(?p ^ param[rel]) }
UNION
{ ?p rdfs:subPropertyOf ?parent
filter(?parent ^ param[rel]) }
}
SELECT merge count(?x) as ?nbactor WHERE{
?x rdf:type param[type]
34
}
35. SPARQL FORMALIZATION
parameterized n-degree
SELECT ?y count(?x) as ?degree WHERE {
{?x (param[rel])*::$path ?y
filter(pathLength($path) <= param[dist])}
UNION
{?y param[rel]::$path ?x
filter(pathLength($path) <= param[dist])}
}
GROUP BY ?y
35
40. INTERPRETATIONS OF RESULTS
validated with managers of ipernity.com
friendOf, favorite, message, comment
small diameter, high density
family as expected: large diameter, low density
favorite: highly centralized around Ipernity
animator.
friendOf, family, message, comment: power law
of degrees and betweenness centralities, different
strategic actors
knows: analyze all relations using subsumption
40
45. 4
Gérard
Mylène
hasCentrality
2
Distance
Degree
father
colleague
isDefinedForPro
perty
Yvonne
Guillaume
mother
supervisor
Michel
Fabien
colleague
colleague Ivan
Philippe
Peter
45
47. SEMSNA: CONCLUSION
• directed typed graph structure of RDF/S
well suited to represent social knowledge &
socially produced medata accross applications and networks
• parameterized SNA operators & SPARQL formalization
enable us to exploit the diversity and the semantic structure
of social data
• SemSNA Ontology
organize and structure social data
47
48. 1. Context and Scientific Objectives
2. State of the Art on Social Network Analysis & Semantic Social
Networks
3. SemSNA: Analysing Social Networks with Semantic Web
Frameworks
4. Community Detection: SemTagP, Semantic Tag Propagation
OUTLINE
48
50. COMMUNITY DETECTION
helps understanding the repartition of actors and activities in a social
network
SOA algorithms strategy mine linking structure in
order to detect densely connected group of actors
50
51. HIERARCHICAL ALGORITHMS
output a dendrogram: a hierarchical tree of denser
and denser communities from top to bottom.
• agglomerative algorithms start from the leaves,
and group nodes in larger and larger communities:
[Donetti & Munoz 2004] [Zhou & Lipowsky 2004]
[Xu et al 2007] [Newman 2004]
• divisive algorithms start from the root of the tree,
and group nodes in denser and denser
communities:
[Girvan & Newman 2002] [Radicchi et al 2004]
51
52. HEURISTIC BASED ALGORITHMS
heuristics related to the community structure of networks
and to community characteristics:
• similarity with electrical networks [Wu 2004]
• random walk [Dongen 2000] [Pons et al 2005]
• label propagation [Raghavan et al 2007]
52
53. MODULARITY MEASURES [Newman 2004]
COMMUNITY PARTITION QUALITY
fraction of the edges that fall within communities minus the expected
such fraction if edges were distributed at random
1 d<i> d< j >
Q= ∑ [Aij − m ]
m i, j∈V , c i =c j
With:
• m be the number of edges of the network
• d<i> the degree of vertex i
• Aij the number of edges between i and j
€ • ci the community of i,
53
54. LABEL PROPAGATION / RAK
(1) assigns a unique random label to each node. [Raghavan et al 2007]
(2) each node n replaces its label by the label most used by its neighbours.
(3) if at least one node changed its label, go to step 2
(4) else nodes that share the same label form a community.
opportunity replace random labels by tags in order to exploit not only
the link structure but also the semantics of actors’ vocabulary!
54
55. FOLKSONOMIES
each tag may represent a community of interest
social tagging flat folksonomie thesaurus
polluant énergie
related related
pollution
[Limpens 2010]
has narrower
pollutions du sol
55
56. TAG PROPAGATION
exploit folksonomy for label assignement
"interaction creates similarity,
while similarity creates
interaction" [mika
wiki 2005] isicil
b e
mediawiki
inria
isicil a d f
c g
isicil
sweetwiki 56
58. SEMANTIC TAG PROPAGATION
wiki:3, sweetwiki: 1, mediawiki: 1
wiki
skos:narrower
sweetwiki mediawiki
wiki
isicil
b e
mediawiki
inria
wiki a d f
c g
isicil
sweetwiki 58
59. SEMANTIC TAG PROPAGATION
2 communities labelled with wiki & isicil
wiki
skos:narrower
sweetwiki mediawiki
wiki
isicil
b e
wiki
isicil
wiki a d f
c g
isicil
wiki 59
60. ALGORITHM SEMTAGP
Algorithm SemTagP(RDFGraph network, Type relation)
1. DO
2. old_network = network
3. //propagate tags (i.e. compute new partitions)
4. FOREACH user IN network.users
5. user.tag = mostUsedNeighborTag(user, relationType)
6. END FOREACH
7. WHILE modularity(network) > modularity(old_network)
8. RETURN old_network
60
61. PARAMETRIZED SPARQL QUERY
delegate all the semantic processing to a semantic graph engine to exploit
semantic relations between tags and to parameterize the analyzed relation
SELECT ?user ?tag ?y WHERE{
?user param[rel] ?neighbor
{{?neighbour scot:hasTag ?tag }
UNION
{?neighbour scot:hasTag ?tag2
?tag skos:narrower ?tag2
filter(exists{?x scot:hasTag ?tag})}
}
ORDER BY ?user ?tag
61
62. PROBLEM
« bad » generalizations
• ubiquitous tags
• too broad tags
• semantic errors
environment
62
63. SOLUTION
user control to disable
semantic relations with
given tags, which
stengthen others
narrower tags
nanotechnology
63
66. MODULARITY LIMITS
• “the ‘optimal partition’, imposed by mathematics, does not
necessarily capture the actual community structure of the
network”
confirmed by experiments
• modularity optimization might miss important substructures
when:
• modules are very fuzzy
• modules have more than 2m edges (which is the case for
half of ADEME’s detected communities)
• perspectives: measuring the average quality of each
€
community
[Fortunato & Barthélemy 2007]
66
67. RESULT
1. pollution
2. sustainable
development
3. energy
4. chemistry
5. air pollution
6. metals
7. biomass
8. wastes
• engineer
• supervisor
• community 67
node size = degree
69. SEMTAGP: CONCLUSION
• SemTagP: semantic community detection and controlled
labelling
• applied to reveal the repartition of ADEME Ph.D fundings
• many perspectives to integrate more semantics:
• investigate other semantics, e.g. skos:related, skos:closematch
• propagate tags through different types of relations
• propagate multiple tags and detect overlapping communities
69
71. CONTRIBUTIONS
• leveraging online social networks to ontology-based
representations
• extending social network analysis to ontology-based
representations
• semantic community detection and labelling
71
72. PERSPECTIVES
scaling to large network
sampling, parallel, iterative algorithms
considering temporal data in the analysis
representing and analysing temporal data
enrich social activities with SemSNA results
better management of resources and relationships
72
73. International conference
Erétéo G., Gandon F., Corby O., Buffa M., “Analysis of a Real Online
Social Network Using Semantic Web Frameworks”. ISWC2009,
Washington D.C., USA.
Erétéo G., Gandon F., Corby O., Buffa M., “Semantic Social Network
Analysis”. Web Science 2009, Athens, Greece.
Book chapter
Erétéo, G., Buffa, M., Gandon, F., Leitzelman, M., Limpens, F., Sanders,
P., “Semantic Social Network Analysis, a concrete case”. Handbook of
Research on Methods and Techniques for Studying Virtual
Communities: Paradigms and Phenomena. A book edited by Ben Kei
Daniel, IGI Global 2011.
National conference
Leitzelman M., Erétéo, G., Grohan,, P., Herledan, F., Buffa, M., Gandon,
F., “De l'utilité d'un outil de veille d'entreprise de seconde génération”.
poster in IC2009, Hammamet, Tunisia.
Workshop
Erétéo, G., Buffa, M., Gandon, F., Leitzelman, M., Limpens,
F., "Leveraging Social data with Semantics", W3C Workshop on the
Future of Social Networking, Barcelona, Spain.
Erétéo, G., Buffa, M., Gandon, F., Grohan, P., Leitzelman, M., Sander,
P., "A State of the Art on Social Network Analysis and its Applications on a
Semantic Web", SDoW2008, Karlsruhe, Germany.
QUESTIONS
73