We describe a domain ontology development approach that extracts domain terms from folksonomies and drive the search for classes and relationships in the Linked Open Data cloud. As a result, we obtain lightweight domain ontologies that combine the emergent knowledge of social tagging systems with formal knowledge from Ontologies. In order to illustrate the feasibility of our approach, we have produced an ontology in the financial domain from tags available in Delicious, using DBpedia, OpenCyc and UMBEL as additional knowledge sources.
Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain
1. Social Tags and Linked Data for
Ontology Development:
A Case Study in the Financial
Domain
Andrés García-Silva†, Leyla Jael García-Castro±,
Alexander García*, Oscar Corcho†
†{hgarcia, ocorcho}@fi.upm.es
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
± leylajael@gmail.com
Universitat Jaume I, Castellón
de la Plana, Spain
*alexgarciac@gmail.com
State University, Florida, USA
June 2014
3. • Vocabulary emerges around resources and users
Golder and Huberman (2006), Marlow et al. (2006)
• Maintained by a large user community
• Flexible (No restricted)
• Up-to-date
• Emergent semantics from the aggregation of
individual classifications
Gruber (2007), Mika (2007), Specia and Motta (2007)
Folksonomies
Folksonomies as a source of knowledge
Introduction
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 3
4. Folksonomies
Statistical-based Ontology-based
State of the art
TagSimilarity
Measures
Ontology
Generation
relation?
Two tags are related if..
Hybrid approaches
Ontology Folksonomy
Ontology
Ontology
Cattuto et al. (2008)
Markines et al. (2009)
Körner et al. (2010)
Benz et al. (2011)
Heymann and Garcia-Molina. (2006)
Begelman et al. (2006)
Hamasaki et al. (2007)
Jäschke et al. (2008)
Kennedy et al. (2007)
Mika (2007)
Benz et al. (2010)
Limpens et al. (2010)
Angeletou et al. (2008)
Cantador et al. (2008)
García-Silva et al. (2009)
Maala et al. (2008)
Passant (2007)
Tesconi et al. (2008))
Giannakidou et al. (2008)
Specia and Motta (2007).
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 4
5. Folksonomies
State of the art
Mika, 2007 Stat Yes Del,Oth Yes Yes No Yes Onto Desc Study No
Hamasaki et al., 2007 Stat Yes Pol No Yes Yes No Onto Task-based No
Jaschke et al., 2008 Stat Yes Del,Bib Yes Yes No No Hier Desc Study No
Limpens et al., 2010 Stat Semi Oth No No Yes Yes Enri Pres/Rec No
Begelman et al., 2006 Stat Yes Del,Raw Yes Yes No No Clus Desc Study No
Kennedy et al., 2007 Stat Yes Fli Yes Yes Yes Yes Inst Pres/Rec No
Heyman & Garcia Molina, 2006 Stat Yes Del,Cit No Yes No No Hier Task-based No
Benz et al., 2010 Stat Yes Del No Yes Yes Yes Hier Pres/Rec No
Giannakidou et al., 2008 Hyb Yes Fli Yes Yes Yes No Clus No No
Specia & Motta, 2007 Hyb Semi Del,Fli Yes Yes Yes Yes Onto Desc Study No
Angeletou et al., 2008 Ont Yes Fli Yes Yes Yes Yes Enri Pres/Rec No
Cantador et al., 2008 Ont Yes Fli,Del Yes Yes No Yes Inst Pres/Rec No
Tesconi et al., 2008 Ont Yes Del Yes Yes Yes Yes Enri Pres/Rec No
Passant, 2007 Ont No Oth Yes Yes Yes Yes Enri Desc Study No
Maala et al., 2008 Ont Yes Fli Yes Yes No Yes Enri Desc Study No
Disambi-
guation
Sem.
Ident
Output Evaluation
Domain
Knowledge
Approach Type Auto Dat Src.
Select. &
Cleaning
Context
Ident.
Statistical-based
• Most of the approaches do not distinguish
between classes and instances
• Relation semantics is limited to some
types and is not precesily defined
• No domain knowledge
Limitations
Ontology-based
• All the approaches produce either
enrichments or instances (No Classes)
• Relations are not identified
• No domain knowledge
Hybrid
• Semi-automatic ontology generation
• No domain knowledge
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 5
6. Proposal
Goal: Generate a domain baseline ontology, containing classes and
relationships, out of folksonomy information.
Folksonomy
Terminology
Extraction
List of domain terms
Domain Experts
Semantic
Elicitation
Linked Open Data*
*“Linking Open Data cloud diagram,
by Richard Cyganiak and Anja
Jentzsch. http://lod-cloud.net/”
drive the extraction of domain
classes and relationships from LOD
Domain relevant
resources (URL)
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 6
7. We propose a process to extract domain knowledge from large and
generic knowledge bases which is driven by the domain
terminology in the folksonomy
• It may save time in the ontology development process
• It allows ontology engineers to understand the domain with a limited
participation of domain experts.
• Smaller and more focused ontologies which are potentially easier to
understand and maintain.
• complex queries and reasoning task may execute faster on smaller
data sets
• In observance of methodological practice, our technique harvests
community knowledge and reuses existing ontologies
• The Ontology has links to external classes and relationships available
in the Linked Open Data cloud.
Benefits
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 7
8. Challenges
Problem: Tags lack semantics
Ambiguity
Synonyms
Acronyms
Morphological variations
Plurals
Singulars
Verb Conjugations
Misspellings
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 8
9. Goal: To extract domain terminology from the folksonomy
Folksonomy A = U x T x R, G = (V,E) where V = U ∪T ∪ R, and E ={(u, t, r)|(u, t, r) ∈ A}
Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)|∃((u, tm, ri)∈A ^ (u, tn, rj)∈A ^ tm= tn)}
Spreading Activaction
Seeds: Domain relevant resources from Domain Experts
Nodes weighted with an activation value used to start the search.
Activation value spreads to adjacent nodes by an activation function.
Activation function: ~ Shared tags between the visited node and the source node, and the
source node activation value.
Activation function > threshold: Node marked as activated and the spreading continuous
to adjacent nodes.
Tags of activated nodes are collected as domain terms.
Terminology Extraction
Approach
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 9
10. Semantic Elicitation
Approach
Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012)
A García-Silva, I Cantador, Ó Corcho
International Journal on Semantic Web and Information Systems 8 (3), 24-41
• Normalize the tag to the standard notation of DBpedia resource titles
• Search for a resource with a label equal to the normalized tag using SPARQL
• If not exists: Use an spelling suggestion service and search again
• If exists: Check if it is related to a disambiguation resource
• If true: retrieve disambiguation candidates
Select the most similar candidate to the tag context
• Vector space model
• Candidate Resources represented using their textual descriptions
• Tag represented using its context (i.e, cooccurrent tags)
• Selection of most similar candidate using Cosine
• If false: Select the resource (Default sense in Wikipedia)
Goal: To relate domain terms (tags) to DBpedia resources
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 10
11. Semantic Elicitation
Approach
Goal: Identify classes from resources
• Use ask constructor to verify if the entity
is a class
• If not:
• Create queries to traverse all the
possible paths of equivalent
relations between the entity and a
class in the RDF graph
# Query 1.
ASK{<resource> <rdf:type> <rdfs:Class>}
# Query 2
SELECT ?class
WHERE{ <resource> ?rel1 ?class.
?class <rdf:type> <rdfs:Class>
FILTER (?rel1 = <owl:sameAs>) }
# Query 3
SELECT ?class
WHERE{ <resource> ?rel1 ?node.
?node ?rel2 ?class.
?class <rdf:type> <rdfs:Class>
FILTER((?rel1 = <owl:sameAs>) &&
(?rel2 = <owl:sameAs>))}
RelFinder: Revealing Relationships in RDF Knowledge Bases.
Philipp Heim, Sebastian Hellmann, Jens Lehmann, Steffen Lohmann and
Timo Stegemann In: Proceedings of the 4th International Conference on
Semantic and Digital Media Technologies (SAMT 2009), pages 182-187.
Springer, Berlin/Heidelberg, 2009.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 11
12. Semantic Elicitation
Approach
Goal: To identify relations between
classes
• For each pair of classes
• Create queries to traverse all the
possible paths between two
classes in the RDF graph, and
retrieve the relationships.
Caveats
• May result in adding non relevant domain
information to the ontology
• Large path
• Path passes through abstract
concepts or relationships
• cyc:ObjectType
• umbel:RefConcept
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 12
13. Semantic Elicitation
Approach
Minimizing the risk to add non relevant information to the ontology
• Keep the path length short
• Our experiments show satisfactory results with short path lengths that allow us to
enrich the initial set of classes while preserving the precision of the ontology
• Avoid high level concepts
• Create lists of high level concepts collected from the knowledge base vocabularies
to filter out the paths containing those concepts
• Knowledge base core vocabularies are usually well documented
• http://umbel.org/specications/vocabulary
• http://mappings.dbpedia.org/server/ontology/classes/
• http://www.cyc.com/kb/thing
• Use semantic similarity distances
• Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy
• Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 13
14. Experiment in the financial Domain
Evaluation
Finance vocabulary
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 14
Input
Evaluation
15. Experiment in the financial Domain
Evaluation
15Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Terminology Extraction
Finance Ontology
Finance vocabulary
16. • Ran the process with an activation threshold 0.8
• The ontology produced consists of 187 classes, 378 relations of 8 different types,
and 12 modules.
Inspecting a financial ontology
Evaluation
16Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
17. A
Evaluation
Class Precision = 80.67%, Relation Precision=96.4%
17Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Inspecting a financial ontology
Evaluation
Ontology Modules
Module Precision (Class) Module Precision (Class)
Organization 77,80% Stock Exchange 84,60%
Company 88,50% Money Transactions 100%
Person 55,60% Country 100%
Union 3,74% Research 100%
Banker 100% Driver 0%
Human 100% Member 100%
18. • We have generated a method for automatically developing domain
ontologies
• Limited user participation
• We benefit from the aggregation of the individual classifications to
extract an emergent domain vocabulary
• In accordance with methodological guidelines we reuse existing
knowledge (The Web of Data)
• We tap into existing links between data sets to collect related
semantic information
• We avoid, to some extent, semantic mismatches
• We avoid heterogeneous representations
• In practice, we expect the method will be used by ontology engineers to
generate baseline ontologies that can be refined later according to the
ontology requirements.
Conclusions
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 18
19. • Develop a method to assess automatically the validity of the relationships
found in the linked data cloud:
• OpenCyc Stock Exchange is owl:sameAs UMBEL Exchange of User Rights
• However:
• Stock Exchange is an organization
• Exchange of User Rights is an event
• The use of semantic similarity measures to decide whether to include or
not relationships found setting up a path between two classes.
• To be able to discover and use datasets in the linked data cloud that cover
the domain of interest.
Future Work
19Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
20. Social Tags and Linked Data for
Ontology Development:
A Case Study in the Financial
Domain
Andrés García-Silva†, Leyla Jael García-Castro±,
Alexander García*, Oscar Corcho†
†{hgarcia, ocorcho}@fi.upm.es
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
± leylajael@gmail.com
Universitat Jaume I, Castellón
de la Plana, Spain
*alexgarciac@gmail.com
State University, Florida, USA
June 2014
Notas do Editor
Tags and list names lack semantics
- Polysemy
- Synonyms
- Morphological variations (plurals, singular)
- Verb Conjugations
Need to identify tag semantics and the relations between tags.