1. ONTOLOGY EXTRACTION FOR BUSINESS KNOWLEDGE
MANAGEMENT
Dr. Raymond Y.K. Lau
Department of Information Systems
City University of Hong Kong
Tat Chee Avenue, Kowloon Hong Kong
Email:
raylau@cityu.edu.hk
Phone: +852 2788 8495 FAX: +852 2788 8694
ABSTRACT
Ontology plays an important role in capturing and disseminating business
information (e.g., products, services, relationships of businesses) for effective human
computer interactions. However, engineering of domain ontologies is very labor intensive
and time consuming. Some machine learning methods have been explored for automatic
or semi-automatic discovery of domain ontologies. Nevertheless, both the accuracy and
the computational efficiency of these methods need to be improved to support large scale
ontology construction for real-world business applications. This paper illustrates a novel
domain ontology discovery method which exploits contextual information of the
knowledge sources to construct domain ontology with better quality. The proposed
ontology discovery method has been empirically tested in an e-Learning environment and
the experimental results are encouraging.
Keywords: Domain Ontology, Ontology Discovery, Statistical Learning, Knowledge
Management.
INTRODUCTION
Knowledge has been recognized as the most important corporate asset and it is the key
for organizations to achieve sustainable competitive advantage. Knowledge management
is a collection of processes that govern the creation, dissemination, and utilization of
knowledge [15, 16]. To be able to effectively manage the intellectual capital, businesses
need an effective approach to identify and capture information and knowledge about
business processes, products, services, markets, customers, suppliers, and competitors,
and to share this knowledge to improve the organizations’ goal achievement. Ontologies
allow domain knowledge such as products, services, markets, etc. to be captured in an
explicit and formal way such that it can be shared among human and computer systems.
The notion of ontology is becoming very useful in various fields such as intelligent
information extraction and retrieval, cooperative information systems, electronic
1
2. commerce, and knowledge management [25]. Since Tim BernersLee, the inventor of the
World Wide Web (Web), coined the vision of a Semantic Web [1] in which background
information of Web resources is stored in the form of machine processable metadata, the
proliferation of ontologies has under tremendous growth. The success of Semantic Web
relies heavily on formal ontologies to structure data for comprehensive and transportable
machine understanding [11]. Although there is not a universal consensus on the definition
of ontology, it is generally accepted that ontology is a specification of conceptualization
[7]. In other words, ontology is a formal representation of concepts and their
interrelationships. It provides a view of the world that we wish to represent for some
purposes [18]. Ontology can take the simple form of a taxonomy (i.e., knowledge
encoded in a minimal hierarchical structure) or a vocabulary with standardized machine
interpretable terminology supplemented with natural language definitions. On the other
hand, the notion of ontology can also be used to describe a logical domain theory with
very expressive, complex, and meaningful information. Ontology is often specified in a
declarative form by using semantic markup languages such as RDF and OWL [5].
Although ontologies are useful in many areas, the engineering of ontologies turns out to
be very expensive and time consuming. Therefore, many automatic or semiautomatic
ontology engineering techniques have been proposed. Automated ontology discovery is
vital for the success of ontology engineering because it deals with the knowledge
acquisition bottleneck which is a classical knowledge engineering problem. Although
fully automatic construction of perfect domain ontology is beyond the current stateof-
the-art, we believe that the automatic ontology extraction method illustrated in this paper
can assist ontology engineers to build domain ontology quicker and more accurately.
Some learning techniques have been applied to the extraction of domain ontology [2, 6,
22]. Nevertheless, these methods are still subject to further enhancement in terms of
computational efficiency and accuracy. One of the ways to improve domain ontology
extraction is to exploit contextual information from the knowledge sources such as the
Internet news about business products, services, and markets. As domain ontology
captures domain (context) dependent information, an effective extraction method should
exploit contextual information in order to build relevant ontologies.
AN OVERVIEW OF THE ONTOLOGY DISCOVERY METHODOLOGY
Figure 1 depicts the proposed methodology of context-sensitive domain ontology
extraction. A text corpus is parsed to analyze the lexico-syntactic elements. For instance,
stop words such as “a, an, the” are removed from the source documents since these words
appear in any contexts and they cannot provide useful information to describe a domain
concept. For our implementation, a stop word file is constructed based on the standard
stop word file used in the SMART retrieval system [20]. Lexical pattern is identified by
applying Part of Speech (POS) tagging to the source documents and then followed by
token stemming based on the Porter stemming algorithm [19]. We refer to the WordNet
lexicon [13] to tag each word during this process. During the linguistic pattern filtering
3. stage, certain linguistic patterns are extracted based on the specific requirements specified
by the ontology engineers. For example, the ontology engineers may only focus on the
“Noun Noun” and “Adjective Noun” patterns instead of all the linguistic patterns. This is
in fact a good way to gain computational efficiency by reducing the number of patterns
for further statistical analysis. In addition, to extract relevant domain specific concepts,
the appearances of concepts across different domains should be taken into account. The
basic intuition is that a concept frequently appears in a specific domain (corpus) rather
than many different domains is more likely to be a relevant domain concept. The
statistical Token Analysis step employs the information theoretic measure to compute the
cooccurrence statistics of the targeting linguistic patterns. Finally, taxonomy of domain
concepts is developed according to the subsumption based fuzzy computational method.
The details of the proposed ontology extraction method will be discussed in Section 4.
0100090000037c00000002001c00000000000400000003010800050000000b0200000000
050000000c02f806170e040000002e0118001c000000fb021000070000000000bc0200000
0860102022253797374656d0006170e0000f8bf1100fce2d430481a310a0c020000170e00
00040000002d0100001c000000fb02a8ff0000000000009001000000000440001254696d6
573204e657720526f6d616e0000000000000000000000000000000000040000002d01010
00400000002010100040000002d010100050000000902000000020d000000320a6000000
00100040000000000100ef50620b64b00040000002d010000030000000000
Figure 1: Context Sensitive Ontology Extraction Process
LEXICO-SYNTACTIC AND STATISTICAL ANALYSIS
After standard document preprocessing such as stop word removal, POS tagging, and
word stemming [21], a windowing process is conducted over the collection of documents.
This makes our method quite different from the approach developed by Sanderson and
Croft [22] which does not take into account the proximity between tokens. The proximity
factor is a key to reduce the number of noisy term relationships. For each document (e.g.,
Net news, Web page, email, etc.), a virtual window of δ words is moved from left to right
one word at a time until the end of a sentence is reached. Within each window, the
statistical information among tokens is collected to develop collocational expressions.
Such a windowing process has successfully been applied to text mining before [9]. The
windowing process is repeated for each document until the entire collection has been
processed. According to previous studies, a text window of 5 to 10 terms is effective [8,
17], and so we adopt this range as the basis to perform our windowing process. To
improve computational efficiency and filter noisy relations, only the specific linguistic
pattern (e.g., Noun Noun, and Adjective Noun) defined by an ontology engineer will be
analyzed.
For statistical token analysis, Mutual Information (MI) is adopted as the basic
computational method. Mutual Information has been applied to collocational analysis
3
4. [17, 24] in previous research. Mutual Information is an information theoretic method to
compute the dependency between two entities and is defined by [23]:
Pr( ti , t j )
MI ( ti , t j ) = log 2
Pr( ti ) Pr(t j )
(1)
MI (t i , t j ) Pr(ti , t j )
where is the mutual information between term ti and term tj. is the
joint probability that both terms appear in a text window, and Pr(t i ) is the probability
wt
that a term ti appears in a text window. The probability Pr(t i ) is estimated based on w
where wt is the number of windows containing the term t and |w| the total number of
Pr(ti , t j )
windows constructed from a textual database (i.e., a collection). Similarly, is the
fraction of the number of windows containing both terms out of the total number of
windows.
We propose Balanced Mutual Information (BMI) to compute the association weights
among tokens. This method considers both term presence and term absence as the
evidence of the implicit term relationships.
Ass(t i , t j ) ≈ BMI (t i , t j )
Pr(t i , t j )
= α × Pr(t i , t j ) log 2 ( + 1) +
Pr(t i ) Pr(t j )
Pr( ¬t i , ¬t j )
(1 − α ) × Pr( ¬t i , ¬t j ) log 2 ( + 1)
Pr( ¬t i ) Pr( ¬t j )
(2)
where Ass(ti, tj) is the association weight between term ti and term tj. Such an association
Pr(ti , t j )
value is approximated by the BMI score. is the joint probability that both terms
Pr( ¬t i , ¬t j )
appear in a text window, and is the joint probability that both terms are
absent in a text window. The factor α > 0.5 is a weight assigned to the positively
associated mutual information. As it is counterintuitive to have a zero BMI value if two
Pr(t i , t j )
Pr(t i ) Pr(t j )
terms always appear Pr(ti,tj ) together in every text window, the faction is
adjusted by adding the constant 1 before applying the logarithm. Since our text mining
method is applied after removing stop words, such an adjustment is reasonable to capture
the intuition of significant term co-occurrence. In Eq.(2), each MI value is then
5. normalized by the corresponding joint probabilities. Only a feature with an association
weight greater than a threshold µ (i.e., Ass(ti, tj) > µ) will be considered a significant
feature for representing a concept in a context vector. After computing all the BMI values
in a collection, these values are subject to linear scaling such that each term association
∀ Ass(t i , t j ) ∈[0,1]
weight is within the unit interval ti ,tj . In should be noted that the
constituent terms of a concept are always implicitly included in the underlying context
vector with a default association weight of 1.
The final stage towards our ontology extraction method is taxonomy generation based on
subsumption relations among extracted concepts. Let Spec(cx, cy) denotes that concept
cx is a specialization (subclass) of another concept cy . The degree of such a
specialization is derived by:
∑ Ass(t
tx ∈cx ,ty ∈cy ,tx = ty
x , c x ) ⊗ Ass(t y , c y )
Spec( c x , c y ) =
∑ Ass(t x , cx )
tx ∈cx (3)
where ⊗ is a standard fuzzy conjunction operator which is equivalent to a minimum
function. The above formula states that the degree of subsumption (specificity) of cx to
cy is based on the ratio of the sum of the minimal association weights of the common
features of the two concepts to the sum of the feature weights of the concept cx. For
instance, if every property term of cx is also a property term of cy , a high specificity
value will be derived. In general, Spec(cx, cy) takes its values from the unit interval [0, 1]
and it is an asymmetric relation. Since it is a fuzzy rather than a crisp relation, Spec(cy ,
cx) may also be true to certain degree. When the taxonomy graph is developed, we only
select the subsumption relation such that Spec(cx, cy) > Spec(cy , cx) and Spec(cx, cy ) >
λ where λ is a threshold to distinguish significant subsumption relations. If Spec(cx, cy) =
Spec(cy , cx) and Spec(cx, cy) > λ is established, the equivalent relation between cx and
cy will be extracted.
EVALUATION
A small scale experiment of testing the functionality of the prototype system and the
accuracy of the aforementioned domain ontology discovery method has been conducted
under a e-Learning environment. A group of ten undergraduate students were recruited to
try the prototype system; all of these subjects have attended a course in knowledge
management before. At the beginning of the experiment, they attended a briefing session
of fifteen minutes to learn the objective of this experiment and were instructed to write
the most important concepts in knowledge management using concise and precise
statements on an on-line discussion board. They were given thirty minutes to write their
messages. After the message generation session, the coordinator of the experiment would
execute the ontology discovery method to discover the domain knowledge from the on-
line discussion messages which are about the writers' perception of knowledge
5
6. management. For this experiment, only the "Noun Noun" linguistic pattern was specified
and the parameter α = 0.5 was set. Each subject could then view the system generated
ontology on-line. Following the viewing session, a questionnaire was distributed to each
subject to let them assess if the ontology could really reflect their understanding about the
chosen topic. Our questionnaire was developed based on the instrument employed by [3].
It
I included the assessment of the following factors:
Accuracy - Whether the concepts and relationships shown at the taxonomy are correct;
Cohesiveness - Whether each concept at the taxonomy is unique and not overlapped
with one another;
w
Isolation - Whether the concepts at the same level are distinguishable and not subsume
one another;
o
Hierarchy - Whether the taxonomy is traversed from broader concepts at the higher
levels to narrow concepts at the lower level;
l
Readability - Whether the concepts at all levels are easy to be comprehended by
human;
A five point semantic differential scale from very good (5), good (4), average (3), bad
(2), to very poor (1) is used to measure these variables. In general, a score close to 5
indicates that the automatically generated ontology is with good quality and it correctly
reflects the mental state of the subjects. The average scores pertaining to various factors
are shown in Table 1. The overall mean score is 4.3 with a standard deviation of 0.61; the
overall mean score is close to the maximum 5. There are 26 key concepts and 97
relationships spreading at 3 levels. From this initial experiment, it is shown that our
domain ontology discovery method is promising since it can automatically discover the
prominent concepts and their relationships from a set of messages written in natural
language.
Mean STD
Accuracy 4.4 0.66
7. Cohesiveness 4.3 0.46
Isolation 4.2 0.60
Hierarchy 4.0 0.63
Readability 4.5 0.67
Overall 4.3 0.61
Table 1: Qualitative Evaluation of Automatically Discovered Ontology
CONCLUSIONS
The manipulation and exchange of semantically enriched business intelligence (e.g.,
products, services, markets, etc.) can enhance the quality of an eCommerce system and
offer a high level of interoperability among different enterprise systems. Ontology
certainly plays an important role in the formalization of business knowledge. However,
the biggest challenge for the wide spread applications of ontologies is on the construction
of these ontologies because it is a very labor intensive and time consuming process. This
paper illustrates a novel automatic ontology extraction method to facilitate the ontology
engineering process. In particular, contextual information of a domain is exploited so that
more reliable domain ontology can be extracted. The proposed extraction method
combines lexico-syntactic and statistical learning approaches so as to reduce the chance
of generating noisy relations and to improve the computational efficiency. Empirical
studies have been performed to evaluate the quality of the domain ontology extracted by
the proposed ontology discovery method. Our preliminary experiment shows that the
extracted domain ontology can accurately reflect the domain knowledge embedded in on-
line discussion messages. Future work involves comparing the accuracy and the
computational efficiency of our extraction method with that of the other approaches. In
addition, larger scale of quantitative evaluation of our ontology extraction method will be
conducted.
REFERENCES
[1] T. BernersLee, J. Hendler, and O. Lassila. The semantic web. Scientific American,
284(5):34–43, 2001.
[2] Shan Chen, Damminda Alahakoon, and Maria Indrawan. Background knowledge driven
ontology discovery. In Proceedings of the 2005 IEEE International Conference on e-
Technology, eCommerce and eService, pages 202–207, 2005.
[3] S. Chuang and L. Chien. Taxonomy generation for text segments: A practical webbased
approach. ACM Transactions on Information Systems, 23(4):363–396, 2005.
[4] P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text corpora using
7
8. formal concept analysis. Journal of Artificial Intelligence Research, 24:305–339, 2005.
[5] The World Wide Web Consortium. Web Ontology Language, 2004. Available from
http://www.w3.org/2004/OWL/.
[6] Michael Dittenbach, Helmut Berger, and Dieter Merkl. Improving domain ontologies by
mining semantics from text. In Proceedings of the First AsiaPacific Conference on
Conceptual Modelling (APCCM2004), pages 91–100, 2004.
[7] T. R. Gruber. A translation approach to portable ontology specifications. Knowledge
Acquisition, 5(2):199–220, 1993.
[8] Hongyan Jing and Evelyne Tzoukermann. Information retrieval based on context
distance and morphology. In Proceedings of the 22nd Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, Language Analysis,
pages 90–96, 1999.
[9] R.Y.K. Lau. ContextSensitive Text Mining and Belief Revision for Intelligent Information
Retrieval on the Web. Web Intelligence and Agent Systems An International Journal, 1(3-
4):1–22, 2003.
[10] Taehee Lee, Ig hoon Lee, Suekyung Lee, Sang goo Lee, Dongkyu Kim, Jonghoon Chun,
Hyunja Lee, and Junho Shim. Building an operational product ontology system. Electronic
Commerce Research and Applications, 5(1):16–28, 2006.
[11] Alexander Maedche and Steffen Staab. Ontology learning for the semantic web. IEEE
Intelligent Systems, 16(2):72–79, 2001.
[12] Alexander Maedche and Steffen Staab. Ontology learning. In Handbook on Ontologies,
pages 173– 190. 2004.
[13] G. A. Miller, Beckwith R., C. Fellbaum, D. Gross, and K. J. Miller. Introduction to wordnet:
An online lexical database. Journal of Lexicography, 3(4):234–244, 1990.
[14] Roberto Navigli, Paola Velardi, and Aldo Gangemi. Ontology learning and its
application to automated terminology translation. IEEE Intelligent Systems, 18(1):22–31,
2003.
[15] I. Nonaka. A dynamic theory of organizational knowledge creation. Organization Science,
5(1):14–37, 1994.
[16] I. Nonaka and H. Takeuchi. The Knowledge Creating Company: How Japanese
Companies Create the Dynamics of Innovation. Oxford University Press, New York,
1995Oxford University Press.
[17] Patrick Perrin and Frederick Petry. Extraction and representation of contextual
information for knowledge discovery in texts. Information Sciences, 151:125–152, 2003.
[18] H. S. Pinto and J. P. Martins. Ontologies: How can they be built? Knowledge and
Information Systems, 6(4):441–464, 2004.
[19] M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
[20] G. Salton. Full text information processing using the smart system. Database Engineering
Bulletin, 13(1):2–9, March 1990.
[21] G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGrawHill,
New York, New York, 1983.
[22] M. Sanderson and B. Croft. Deriving concept hierarchies from text. In Proceedings of the
22nd Annual International ACM SIGIR Conference on Research and Development in
9. Information Retrieval, pages 206–213. ACM, 1999.
[23] C. Shannon. A mathematical theory of communication. Bell System Technology Journal,
27:379–423, 1948.
[24] Mark A. Stairmand. Textual context analysis for information retrieval. In Proceedings of the
20th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, pages 140–147, 1997.
[25] Christopher A. Welty. Ontology research. AI Magazine, 24(3):11–12, 2003.
[26] Rudolf Wille. Formal concept analysis as mathematical theory of concepts and concept
hierarchies. In Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors, Formal Concept
Analysis, Foundations and Applications, volume 3626, pages 1–33. Springer, 2005.
[27] J. Xu and W.B. Croft. Query expansion using local and global document analysis. In Hans-
Peter Frei, Donna Harman, Peter Schauble, and Ross Wilkinson, editors, Proceedings of the
19th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, pages 4–11, Zurich, Switzerland, 1996.
9