So far, within the Library and Information Science (LIS) community, Knowledge Organization (KO) has developed its own very successful solutions to document search, allowing for the classification and search of millions of books. However, current KO solutions are limited in expressivity as they only support queries by document properties, e.g., by title, author and subject. In parallel, within the Artificial Intelligence and Semantic Web communities, Knowledge Representation (KR), has developed very powerful end expressive techniques which, via the use of ontologies, support queries by any entity property (e.g., the properties of the entities described in a document). However, KR has not scaled yet to the level of KO, mainly because of the lack of a precise and scalable entity specification methodology. In this paper we present DERA, a new methodology, inspired by the faceted approach, as introduced in KO, that retains all the advantages of KR and compensates for the limitations of KO. DERA guarantees at the same time quality, extensibility, scalability and effectiveness in search.
From Knowledge Organization to Knowledge Representation (ISKO UK Conference)
1.
2. LIS developed its own very successful
solutions for the classification and
search of documents.
In KO, search is focused on document
properties (e.g. title, author, subject).
KO tends to fail in situations when
users express their needs in terms of
entity properties (e.g., of the author)
3. Search by document properties
Give me documents with subject “Michelangelo”
and “marble sculpture”
Search by entity properties
Give me documents about marble sculptures
made by a person born in Italy
4. Subject: Buonarroti, Michelangelo
Subject: sculpture – Renaissance
In indexes the syntax of the subjects is given
by a subject indexing language grammar.
What about the semantics?
•
•
•
Is Michelangelo the Italian artist? When and
where he was born? What are his most
famous works?
Is sculpture the form of art?
Is Renaissance the historical period? When
and where exactly?
5. Name: David
Class: statue
Author: Michelangelo
Date of creation: 1504
Matter: marble
Height: 4.10 m
Name: Michelangelo
Surname: Buonarroti
Class: artist
Date of birth: 6th March 1475
Date of death: 18th February 1564
6. KR
search by any entity property
KO
search by
title, author,
subject
7.
8.
9.
10. Give me documents about any lake
with depth greater than 100 written by Italians
11. How to build high quality and scalable
descriptive ontologies?
DERA is faceted as it is inspired to the
principles and canons of the faceted
approach by Ranganathan
DERA is a KR approach as it models entities
of a domain (D) by their entity classes
(E), relations (R) and attributes (A)
12. Step 1: Identification of the atomic concepts
(E) watercourse, stream: a natural body of
running water flowing on or under the earth
Step 2: Analysis
a body of water
a flowing body of water
no fixed boundary
confined within a bed and stream banks
larger than a brook
13. Step 3: Synthesis.
(E) Body of water
(is-a) Flowing body of water
(is-a) Stream
(is-a) Brook
(is-a) River
(is-a) Still body of water
(is-a) Pond
(is-a) Lake
Step 4: Standardization.
(E) stream, watercourse: a natural body of
running water flowing on or under the earth
14. Step 5: Ordering
Terms and concepts in the facets are ordered
Step 6: Formalization
Descriptive ontologies are translated into
Description Logic formal ontologies, e.g.,:
River ⊑ Stream
River (Volga)
Length (Volga, 3692)
15. • DERA facets have explicit semantics and are
modeled as descriptive ontologies
• DERA facets inherits all the nice properties of
the faceted approach, such as robustness
and scalability
• DERA allows for a very expressive document
search by any entity property
• DERA allows for automated reasoning via the
formalization into Description Logics
ontologies
16. The usefulness of moving from KO to KR
• KO is methodologically very strong, but subjects are
limited in formality and expressiveness as, by
employing classification ontologies, it only supports
queries by document properties.
• KR, by employing descriptive ontologies, supports
queries by any entity property, but it is
methodologically weaker than KO.
We propose the DERA faceted KR approach
• DERA, being faceted, allows the development of high
quality and scalable descriptive ontologies
• DERA, being a KR approach, allows modeling relevant
entities of the domain and their E/R/A properties and
enables automated reasoning.
• It supports a highly expressive search of documents
exploiting entity properties.
Notas do Editor
KO has developed its own solution for the classification and search of documents in libraries and digital archives
However, users may think by entity properties, instead of document properties.
Subjects are informal natural language strings, or more precisely their syntax is often formal (because of the subject indexing language grammar), but their semantics (the meaning) is left implicit.
Thinking in terms of entities, we can describe works and people in terms of entities. This is done in LIS, but informally.One immediate consequence is that it seems there are not systematic ways to reason about these objects.As a matter of fact, information is scattered in many resources.
In fact, documents are just on particular case of entities, with title, author and subject very important properties.
(1) We identify atomic concepts, synonyms and natural language definitions. Each concept is marked as E, R or A.(2) For each concept we identify its characteristics
(3) Facets are built as descriptive ontologies (e.g., we make use of explicit is-a and part-of relations).(4) The preferred term is elected