Towards a Multilingual Ontology for Ontology-driven Content Mining in Social Web Sites
1. Towards a Multilingual Ontology for
Ontology-driven Content Mining in Social
Web Sites
Marcirio Silveira Chaves1
- marcirioc@uatlantica.pt
Cássia Trojahn2
- cassia.trojahn@inrialpes.fr
1
Universidade Atlântica, Oeiras, Portugal
2
INRIA & LIG, Grenoble, France
Workshop on Cross-Cultural and Cross-Lingual Aspects of the Semantic Web
Shanghai, China, November 7th, 2010
In conjunction with the 9th International Semantic Web Conference (ISWC2010)
2. Motivation
• Social Semantic Web is highly dependent on the development of
multilingual ontologies.
• Only 2.5% of the ontologies in the OntoSelect library is multilingual.
• (Multilingual) Hotel domain ontologies are rare.
• Multilingual comments need to be processed.
• Ontology-driven mining of comments from Social Web sites.
November 7th 2C3LSW2010
3. Context
• Customer Knowledge Management (CKM)
– Customer Relationship Management (CRM) and
– Knowledge Management (KM).
• Multilingual comments to support CKM
November 7th 3C3LSW2010
4. Outline
• Multilingual Ontology Application
• Hontology
• Related Ontologies
• Extending Hontology
• Conclusion
• Ongoing Work
November 7th C3LSW2010 4
5. Multilingual Ontology Application
November 7th 5C3LSW2010
Social web
data
Social web
data
Social web
data
Extraction
Transformation
Loading
Extraction
Transformation
Loading
Comment
annotator
Comment
annotator
Multilingual ontologyMultilingual ontology
Ontology
augmenter
Ontology
augmenter
User
interface
User
interface
Knowledge
base Expert
Data pre-processing Ontology
enrichement
SearchingComments
annotation
CKMCKM
Manager
6. Hontology
• Development Methodology
– Identify existing ontologies on related domains
– Select the main concepts and properties
– Organize concepts and properties hierarchically into categories
– Translate the ontology (manual)
– Expand concepts and properties based on comments
– Translate the new concepts and properties (manual)
– Generate the ontology in several formats
November 7th 6C3LSW2010
8. • Category: contains all the types of categories into which a Hotel
can be classified, e.g., tourist, comfort, and luxury.
• Facility: includes the utility options offered by each hotel, e.g.,
beauty salon, kids club, and pool bar.
• Hospitality: contains the existing kinds of hotels, e.g., hostel,
pension, and motel.
November 7th 8C3LSW2010
Hontology
9. • Hotel: details the kind of hotels, e.g., bunker, cave, and capsule.
• Leisure: lists the leisure options, e.g., gym, jacuzzi, and sauna.
• Points of interest: often mentioned in comments about the
hotels, e.g., stadium, museum, and monument.
• Room: splits into Hostel Room and Hotel Room, which have
different kinds and nomenclature for rooms.
November 7th 9C3LSW2010
Hontology
10. • Hontology supports three languages
– English, French and Portuguese
• 97 concepts
• 9 object properties
• 25 data properties
November 7th 10C3LSW2010
Hontology
11. Related Ontologies
Mondeca HarmoNET Travel
Itinerary
Hontology
# concepts 1000 54 8 97
# properties n.a. 166 24 34
# instances Zero Zero Zero Zero
Domain Tourism Tourism Travel Hotel
Multilingual No No No Yes
Use Mondeca
Project
Accommodation
and events
n.a. Hotel Sector
Support
Decision
Public freely
available
No Yes Yes Yes
November 7th 11C3LSW2010
12. Extending Hontology
• Ontology augmenter
• Multilingual ontology matching
• Machine-learning methods
• (Semi)-automatically multilingual extension
• Hontology can be used as a multilingual resource to cross-language
information retrieval.
November 7th 12C3LSW2010
13. • Ontology augmenter
Term correlation: considers potential terms mentioned in the comments, which are
present in Hontology.
• ``Rooms are comfortable, but pillows are very hard'' the terms ``pillow'' (in
the ontology) and ``room'' (not in the ontology) should be probably related
through a property linking them in Hontology.
• Once the ontology is enriched with the term ``pillow'', a comment
containing, for instance, only the sentence ``Pillows are very hard'' can be
found under the concept ``room''.
November 7th 13C3LSW2010
Extending Hontology
14. • Ontology augmenter
Rules (or lexical patterns): comments usually contain a set of common adjectives,
e.g., good, cheap, and soft.
• Using lexical patterns and extract relevant terms which are preceding or
succeeding the adjective,
• ``Air-conditioned is loud'', ``Small bathroom''.
November 7th 14C3LSW2010
Extending Hontology
15. • Ontology augmenter
Synonyms
• elements that must be considered in the improvement of Hontology.
• they have already being considered in the process of adding labels to the
concepts.
• This task can be extended with the help of dictionaries and lexical resources
within an automatic process.
November 7th 15C3LSW2010
Extending Hontology
16. November 7th C3LSW2010 16
Ongoing work
(1) enrich Hontology by using potential terms from comments
(2) exploit Hontology in Multilingual Ontology Matching (i.e., creating
between Hontology and other ontologies)
(3) include labels in other languages
(4) exploit the issues related to ontology localization and
internationalization.
17. • Main contribution
– to make available for the community, a multilingual ontology
that can be used as a baseline for many usages and applications
in the context of the Multilingual Semantic Web.
November 7th 17C3LSW2010
Final Remarks