Building_a_Personal_Knowledge_Recommenda

Proceedings ofthe 10th International Conference on Computer Supported Cooperative Work in Design
Building a Personal Knowledge Recommendation System using Agents, Learning
Ontologies and Web Mining
Juliana Lucas de Rezendel, Vinicios Batista Pereira', Geraldo Xexeo 2, Jano Moreira de Souza'2
'COPPE/UFRJ- Graduate School ofComputer Science
2DCCIIM- Institute ofMathematics
Federal University ofRio de Janeiro, PO Box 68.513, ZIP Code 21.945-970, Cidade Universitaria -
Rha do Funddo, Rio de Janeiro, RJ, Brazil
tjuliana, vinicios, xexeo, jano}@cos.ufrj.br
Abstract
In this paper we consider a process which
complements the learning process for building personal
knowledge through the exchange of knowledge chains.
This approach consists in the partial automatization of
theprocess ofcreating knowledge chains, through the use
of the technology from agents, ontologies and data
mining. The agents will monitor all media used by the
learner, and will classify its content using an ontology.
From there, we want to create and recommend a chain to
the learner. This point became important when we
observed that the learners weren't motivated to create
their chains, which, normally, takes a lot ofeffort.
Keywords: CSCW, Collaborative Knowledge Design,
Recommendation System, Learning ontology, Web
Mining.
1. Introduction
Today people need to acquire new knowledge faster
and in a much greater volume than in the past. To
complement the learning process, there are communities
ofpractice focused on learning, i.e., learning communities.
These communities act as both a method to complement
teaching in the traditional classroom, and to acquire
knowledge in evolution. [1] Pawlowski [2] defined a
learning community as being an informal group of
individuals engaged in a common interest, which is, in
this case, the improvement of the learner's performance
using computer networks. One of the principles of
Wenger [3] for cultivating communities of practice is the
sharing of knowledge to improve personal knowledge.
Another issue related to making a successful community
should be intense communication between the members.
Finally, a community should assist the members in
building up their personal knowledge. [4]
To complement the learning process, we considered a
process to promote knowledge building, dissemination
and exchange in learning communities. The need for a
number of individuals to work together (on knowledge
design) raises problems in the CSCW domain. [5]
Knowledge design [6] is defined as a science of
selecting, organizing and presenting the knowledge in a
huge knowledge space in a proper way so that it can be
sensed, digested and utilized by human beings efficiently
and effectively. It aims to offer the right knowledge to the
right person in the right manner at the right point of time.
According to Xexeo [7], the design activity has been
described as belonging to a class ofproblems that have no
optimal solution, only satisfactory ones. They are
complex, usually interdisciplinary in nature and require a
group of people to solve it. Designing knowledge is
similar in principle to designing computer software. It
takes time, careful thought and creativity to do it well.
The biggest difference is that you can't just load the
knowledge into someone's brain like you can do with the
software in a computer; you need an implementation
procedure to build the knowledge in the learner's mind. [6]
1.1. Motivation
To complement the learning process, a system has
been developed to promote knowledge building,
dissemination, and exchange in learning communities.
This system is called Knowledge Chains Editor (KCE)
and is based on a process for building personal
knowledge through the exchange of knowledge chains
(KCs) [1]. It is implemented on top of COPPEER'. The
process differential is the addition of "how to use" the
available knowledge to the triad "authors" (who),
"localization" (where), and "content" (what), which are
commonly used.
The KC (shown in Figure 1) is a structure created to
organize knowledge structure and organization. A KC is
made up of a header (which contains basic information
related to the chain) and a knowledge unit (KU) list.
1 COPPEER [7] is a framework for creating very flexible collaborative
peer-to-peer (P2P) applications. It provides non-specific collaboration
tools as plug-ins.
1-4244-0165-8/06/$20.00 C 2006 IEEE.

a) Knowledge Chain b) Knowledge
Composition
Figure 1. Knowledge organization
Conceptually, knowledge can be decomposed into
smaller units of knowledge (recursive decomposition).
For the sake of simplification, it was considered that there
is a basic unit which can be represented as a KU (a
structure formed using a set of attributes).
To build his KC the learner can use the KCE. In the
case of questioning he must create a KU whose state is
"question". At this moment the system starts the search. It
sends messages to other peers and waits for an answer.
Each peer performs an internal search. This search
consists of verifying if there are any KUs similar to the
one in the search. All KUs found are returned to the
requesting party, as shown in Figure 2.
Figure 2. KCE architecture
The creation of a KU of type 'question' is obviously
motivated by the learner's need to obtain that knowledge.
So far, we have considered the existence of two
motivating factors for the creation of available KCs. The
first would be a matter ofrecognition by the communities,
since each KU created has a registered author. The
second would be the case where the professor makes
them available "as a job", with the intention of guiding
his students' studies.
However, we were aware that the learner needs more
motivation to create new KCs. In the attempt to solve this
problem, in this work we present a proposal for an
evolution of the KCE. The main goal is to recommend
potential KCs that can be accepted, modified or even
discarded by the learner. These KCs will be created from
the data collected by monitoring (carried out by a
software agent2) learner navigation.
2
A Software Agent [8] can be defined as a complex object with attitude.
1.2. Related Work
Apart from KCE, there are other tools that stimulate
knowledge sharing in communities. These include:
WebWatcher [16], which is a search tool where the
learner specifies his interests and receives the related
pages navigated by the other community members.
OntoShare [17] uses software agents which allow the user
to share relevant pages. MILK [18] allows the
communities to manage knowledge produced from
metadata. The main difference between these tools and
the KCE is that they are focused on sharing "where"
and/or "with whom" the knowledge can be found. KCE
adds the sharing of "what" and "how to use" this
knowledge.
The remainder of this paper is organized as follows.
The main concepts ofweb mining and learning ontologies
are presented in the next two sections. Section 4 presents
the proposed idea and the prototype developed.
Conclusions are given in section 5.
2. Web Mining
In a simplified way, we can say that web mining can
be used to specify the path taken by the user while he is
navigating on the web (Web Usage Mining) and to
classify navigated pages (Web Content Mining). [9, 10]
However, there is a problem that cannot be solved only
using web mining, and this is the difficulty in calculating
the information hierarchy. This problem can be solved
with the use of ontologies3.
In addition to the availability of little (maybe any)
structure in the text, there are other reasons why text
mining is so difficult. The existing concepts of a text are
usually rather abstract and can hardly be modeled by
using conventional knowledge representation structures.
Furthermore, the occurrence of synonyms (different
words with the same meaning) and homonyms (words
with the same spelling but with distinct meanings) makes
it difficult to detect valid relationships between different
parts ofthe text. [12]
2.1. Web Usage Mining
We use web usage mining when the data is related to
user navigation, this means, when we store and analyze
the order of the navigation pages, the visit length for each
page and the exit page. This information will be important
for verifying, respectively, what the order of the
navigated concepts is, after page classification; and which
3Ontology [11] is a formal specification of concepts and their
relationships. By defining a common vocabulary, ontologies reduce
concept definition mistakes, allowing for shared understanding,
improved communications, and a more detailed description ofresources.

pages are relevant when the user doesn't follow the
structure of a site and goes to a new site on the same
subject, or stops studying the subject. [9]
2.2. Web Content Mining
Once the relevant pages are selected using web usage
mining, web content mining can be used to analyze and to
classify the page content. [9] In this kind of mining the
input data is the HTML code of the page and the output
data is one or more possibilities for classification of the
page in accordance with the used ontology.
In order to simplify the page classification we used an
automatic summarization technique (AST) to extract the
most relevant sentences from the page. [12] First, the
AST applies several preprocessing methods to the input
page, namely case folding, stemming and removal of stop
words. The next step is to separate the sentences. The end
of a sentence can be defined as a "." (full stop), a "!"
(exclamation mark), a "?" (question mark), etc. In HTML
texts we can also consider tags ofthe language.
Once all the sentences of the page are identified, it is
necessary to give a "weight" to each remaining word
based on its HTML tag [Tablet] and to compute the value
of a TF-ISF (term frequency - inverse sentence frequency)
measure for each word. For each sentence s, the average
TF-ISF weight ofthe sentence denoted Avg-TF-ISF(s) is
computed by calculating the arithmetic average ofthe TF-
ISF(w,s) weight over all the words w in the sentence.
Sentences with high values of TF-ISF are considered
relevant.
Once the value of the Avg-TF-ISF(s) measure is
computed for each sentence s, the final step is to select
the most relevant sentences, i.e. the ones with the largest
values of the Avg-TFISF(s) measure. In the current
version of our system this is done as follows: the system
finds the sentence with the largest Avg-TF-ISF(s) value,
called the Max-Avg-TF-ISF value; the user specifies a
threshold on the percentage of this value, denoted
percentage-threshold. Sentences with high values of TF-
ISF are selected to produce a summary of the source text.
According to Larocca [12] this technique has been
evaluated on real-world documents, and the results are
satisfactory.
3. Building and Using Ontologies
According to Guarino [13] the ontologies can be
categorized in 4 types: top-level, domain, task and
application. Top-level ontologies describe very general
concepts like space, time, object, etc., which are
independent of a particular problem or domain. Domain
ontologies and task ontologies describe, respectively, the
vocabulary related to a generic domain (like medicine, or
automobiles) or a generic task or activity (like diagnosing
or selling), by specializing the terms introduced in the
top-level ontology. Application ontologies describe
concepts depending both on a particular domain and task,
which are often specializations of both the related
ontologies.
A more generic ontology can become without great
effort, more specific in accordance with the necessity.
However, to transform a specific ontology into a more
generic one can be a difficult task. Therefore, in this work
we first created a domain ontology and from this we
created a more specific ontology which was more
appropriate to our needs.
The prototype developed has been created to the Java
learning community and the first ontology created was a
domain ontology which describes the object oriented (00)
language concepts. After this, specific properties were
added to the created ontology to incorporate thesaurus
functionalities. This way, the software agent can search in
the ontology for words found in the text and correlate
web pages with ontology concepts, transforming the
domain ontology.
All classes that symbolize concepts from an 00
language inherit of a superclass called Concept. In our
case, this superclass contains a property named keyword,
which is used on the page classification, and ifwe need to
add new properties related to the classification it is
enough to make it in the Concept class. To transform the
new ontology in the domain ontology it is enough to
remove the Concept class.
The 00 language ontology was instantiated to Java to
be used as a specific base of knowledge by the
application. With the concepts and relations instantiated,
it's possible to compare the keywords found in the page
mining process with the ontology keywords. The
attribution of weights to the page keywords makes
possible the probabilistic classification of the page
according to the ontology concept.
The relationship between the ontology concepts can be
used to support decisions about the concept represented
by a page. When the page has the occurrence ofkeywords
that are concepts related to the same concept, the page
can be classified as a representation of the common
concept.
Con, ept
ClassLibrary -
DataTyp e
Class _ -
InnerClass
-contams
Superclass
f subclass l..* instance
Figure 3. Example of ontology

For example, in Figure 3, we have an ontology that has
the concept Package related to the concept Class, and
Package java.util related to Class, Vector and HashTable.
If a page has keywords with the same weight referring to
the java classes Vector and HashTable, the system can
consider that both are related to Package java.util and can
classify the page as a reference to Package.
3.1. Learning Ontologies
As has been said before, the addition of the collected
information during web mining to the existing ontology
makes the creation of learning ontology possible.
Collaborative learning ontology [14] is the system of
concepts for modeling the collaborative learning process,
such as 'learning goal', 'learning group type' and
'learning scenario'. When the ontologies are in use they
are usually arranged in three layers. The top layer is the
negotiation level that corresponds to negotiation ontology.
The intermediate layer corresponds to the collaborative
learning ontology. Here, only important abstracts for
negotiation from agent level remain as the necessary
scope of information at an abstract level. The negotiation
level is the level that represents the important information
for negotiation at an abstract level. The bottom layer is
the agent level that corresponds to individual learning
ontology.
This work contemplates only the two lower layers of a
collaborative learning ontology, as it captures the
learner's personal learning process, which supports the
lowest layer; and allows the exchange of learning
processes, creating the necessary information for the
highest layer.
all this information, the agent can build a potential KC
that will be recommended to the learner.
1. Moitoh 2 Select pages
iiavigated1pages 111111 gl'anId
Softwrare Agent store it
kiiow1ed'eh ee to 3 Get stoied
6. RecoiiiiendtK W pa=Ies
1! j AlToh Tot I~~~~~~
laiowvledc,e tr-ee to
benurner
,rb
4. Casasifiv pages
Sofafe A.eit witlh oitolog
T,Z9-sFeveted2 0 c oiic epts".
i
p oncepts;
Figure 4. KCE personal knowledge
recommendation architecture
It is necessary to point out that the new KC will be
recommended to the same learner that is navigating on
the web. He will decide if he wants to add (or not) the
recommended KC to his personal knowledge. From this
point onwards, if the learner accepts the KC, it can be
exchanged between the community members using the
KCE.
4. Automatic Building of Personal
Knowledge Chains
The main target of this work is to automatically build
knowledge chains to be recommended to the learners. As
has been previously stated, the learner can accept, modify
or even discard these KCs. For this to be possible, the
proposal is to extend the Knowledge Chains Editor (KCE)
[1] to automatically build personal KCs.
In order for this to occur, it is necessary to have an
ontology of the considered domain. The goal is to
determine the sub-groups of navigated concepts (concepts
found in the navigated pages), and relate them to the
pages.
The software agent will observe the learner's
navigation through web pages (of the considered domain),
and then it will store the page content and the time spent
on each page (as shown in Figure 4).
After this, another agent has the responsibility for
mining the navigation and the page content to determine
the sub-group of ontology navigated concepts related to
the navigated pages and to create a graph from this. With
4.1. Knowledge Chains Recommendation System
The learner software agent is responsible for
observing the learner's navigation and storing the
navigated page content, visit length and the times that it
has been accessed. In this first stage the agent only
creates a database of web pages and access information
(Web Usage Mining). At a later stage, with a frequency
determined by the user, another software agent will select,
from the stored pages, the pages that are related to the
subject discussed by the community (The subject must be
known because it is necessary to have an ontology on it in
the community). This will be made by comparing the
content of the web page with a set of keywords (ontology
concepts) related to the subject in question. In this way
the stored pages are filtered, with only the ones that are in
fact of interest to the community remaining. This also
solves any problems related to user privacy, since those
pages that are not related to a community subject are
discarded.
With this set of stored pages the system has a guided
graph, because the navigation order has been stored. As

the system objective is to make a KC with the concepts
studied by the learner, it is necessary to use text mining
techniques to classify the pages in accordance with the
described concepts of the ontology. This classification is
based on the proposals of Desmontils [15] and Jacquin
[16]. However, instead of using a thesaurus with an
ontology, we have improved our ontology by adding, in
all concepts, a vector of attributes with the keywords
related to the concept. Thus, we can do the mining and
the classification only using the ontology.
At this time, the system needs to remove all the stop
words from the text on a page. Then it is necessary to
give a "weight" to each remaining word, based on its
HTML tag. The weights are given in accordance with the
values given in Table 1.
Table 1. Higher coefficients associated with HTML
markers [15, 16]
HTML marker
HTML marker Weigth
description
Document Title <title></title> 10
Keyword <meta name="keywords"... 9
content= ...>
Hyper-link <a href=...></a> 8
Font size 7 5
Font size +4 5
Heading level I <hl></hlI> 3
Image title <img ... alt=".. "> 2
Underline font 2
Italic font 2
Bold font 2
Once the frequency and the weight ofthe keywords on
a page are compared with the ontology concepts, the page
receives degrees of relevance. With this relationship
between pages and ontology concepts, the graph of pages
can be transformed into a knowledge chain. This KC will
be recommended to the learner, and he can decide what to
do with it.
As there are many software agents "working" for the
learners, a lot of KCs will be created. Therefore, it is
possible to identify absent concepts in the navigation of
one learner that have already been studied by another, and
recommend KUs, concepts, pages and even the users who
know the concepts the learner doesn't know.
4.2. Example
The following example shows how a KC is created
from the learner's navigation through web pages.
Figure 5 shows the web pages navigated by the learner
and Figure 6 shows the ontology of the community where
arrows represent a non hierarchical relationship.
In the first stage, web mining will be performed, and
according to the keywords found on the web page, it may
match partly with one concept from ontology and partly
with another.
Figure 5. Web page
navigation
Figure 6. Community
ontology
In this case, there is a relevance degree for each
concept relating to the page. Therefore, for each page, the
result is:
Page 1: a 60%; b 10%-; c 30%-;*-
Page 2: a 0%; b 0%. c 100%-; *-
After relating the web pages to the most relevant ontology
concepts, the software agent will create a learning path in
the ontology, which is a learning ontology,
a c d e
and the creation ofthe KU is initiated, mapping the web
pages on the learning ontology.
a.ui or
At this time the KUs are created using the learning
ontology and all the information on the learner's
navigation through web pages. In this example it is
necessary to study "attribute", then study "class", to study
"object".
As has been said before, a KU is a structure formed by
an attribute set. These attributes are grouped into
categories: General (name, description, keywords, author,
creation date, last use date, etc), Life Cycle (history,
current state, and contributors), Rights (intellectual
property rights and conditions of use), Relation (the
relationship between knowledge resources), Classification
(the KU in relation to a classification system) and
Annotation (comments and evaluations of the KUs and
their creators). Many of these attributes can be
automatically filled, which facilitates the creation of new
KCs.
atn-bi ft.- I h-
h-Fobip,P- P-i i

5. Conclusions and Future Work
The growing number of learning communities which
communicate online makes it possible to exchange, and
use chains of explicit knowledge as a strategy for creating
personal knowledge. Today, we have the WWW (who,
what, where) triad, where "who" is the people who have
the knowledge, "what" is the knowledge itself, and
"where" is its location - in our case, the peer in which it is
located. Using knowledge chains, we hope to add "how to
use" the available knowledge to the existing triad.
As has been previously stated, to motivate the learner
in the creation of new KCs, we propose a personal
knowledge recommendation system that uses software
agents technology to monitor learner navigation; uses
web mining to plot the path taken by the user while he is
navigating on the web and to classify the navigated pages;
and uses learning ontologies in addition to all the
information collected for the cration ofnew KCs.
The experimental use of the extended KCE shows
evidence that, when used by a learner to build a personal
KC, the hypothesis that he/she creates more new KCs,
that he will achieve a reduction in the time dedicated to
studying a specific subject as well as gaining a more
comprehensive knowledge ofthe subject studied has been
confirmed. In order to evaluate whether the KCE's
objective has been reached, experiments aimed at
obtaining qualitative and quantitative data that would
make the verification of the hypothesis under
consideration possible must be carried out.
It is necessary to emphasize that it is not the goal of
this work to ensure that the learner has assimilated
everything in his KCs. Our goal is to stimulate the
creation of new KCs, so that the knowledge network can
expand, and better assist the community members.
Due to the fact that this work is still in progress, many
future projects are expected to take place. The most
important are: improving the algorithm used to map the
web page on the ontology nodes, and extending the
monitored domain, considering any media manipulated by
the learner, instead of only the navigated web pages.
Acknowledgement
This work was partially supported by CAPES and
CNPq.
References
[3] E. Wenger et al., Cultivating Communities ofPractice: A
guide to Managing Knowledge, Harvard Business School
Press, 2002.
[4] J.M. Souza, A. Tornaghi and A. Vivacqua, "Creating
educator communities", Int. Journal Web Based
Communitie, Grd-Bretanha, 2005, pp. 1-15.
[5] J.L. Rezende, J.M. Souza, J.F. Souza and G.B. Xexeo,
"Peer-to-Peer collaborative integration of dynamic
ontologies", Proc. of the 9th Int. Conf on CSCWD,
Coventry, UK, 2005.
[6] M. Leitch, Human Knowledge Design, An undergraduate
project, February 1986. (Published on the web on 31 July
2002)
[7] J.M. Souza, et al, "COE: A Collaborative Ontology Editor
based on a Peer-to-Peer framework", Int. Journal of
Advanced Engineering Informatics, Germany, pp. 1-15.
[8] J.M. Bradshaw, "An introduction to software agents", in
J.M. Bradshaw (eds), Software Agents, MIT Press, 1997.
[9] O.R. ZaYane, "Web Mining: Concepts, Practices and
Research", Conference Tutorial Notes, XIV SBBD, Joao
Pessoa, Paraiba, Brazil, Oct 2000.
[10] R. Cooley, B. Mobasher and J. Srivastava, "Web Mining:
Information and Pattern Discovery on the World Wide
Web". Proc. of the 9th IEEE Int. Conf on Tools with
Artificial Intelligence, Newport Beach, CA, USA, Nov
1997.
[11] T.R. Gruber, "Toward principles for the design of
ontologies used for knowledge sharing", Int. Workshop on
Formal Ontology, 1993.
[12] J. Larocca Neto, et al, "Document clustering and text
summarization", Proc. of the 4th Int. Conf on Practical
Applications of Knowledge Discovery and Data Mining,
2000.
[13] N. Guarino, "Formal ontology in information systems",
Proc. ofFOIS'98, Trento, Italy, IOS Press, June 1998.
[14] T. Supnithi, et al, "Learning goal ontology supported by
learning theories for opportunistic group formation", in
S.P.Lajoie and M.Vivet (eds), Artificial Intelligence in
Education, IOS Press, 1999.
[15] E. Desmontils and C. Jacquin, "Indexing a web site with a
terminology oriented ontology", SWWS, 2001, pp.549-565.
[16] T. Joachims, D. Freitag and T. Mitchell, "Webwatcher: A
tour guide for the world wide web", Proc. of the 15th Int.
Joint Conf on Artificial Intelligence (IJCAI97), Nagoya,
Japan, Aug 1997, pp. 770-775.
[17] J. Davies, A. Duke Y. Sure, "OntoShare - A knowledge
management environment for virtual communities of
practice". Proc. of the Int. Conf on Knowledge Capture
(K-CAP03), Sanibel Island, Florida, USA, 2003.
[18] A. Agostini et al., "Stimulating knowledge discovery and
sharing", Proc of Int. ACM Conf on Supporting Group
Work, Sanibel, Florida, 2003, pp.248-257.
[1] J.L. Rezende, et al, "Building personal knowledge through
exchanging knowledge chains", Proc. ofIADIS Int. Conf
on WBC, Algarve, Portugal, 2005, pp. 87-94.
[2] S. Pawlowski et al., "Supporting shared information systems:
boundary objects, communities, and brokering", Proc. of
the 21th Int. Conf on Information Systems, Brisbane,
Australia, 2000, pp. 329-338.

Building_a_Personal_Knowledge_Recommenda

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (10)

Semelhante a Building_a_Personal_Knowledge_Recommenda

Semelhante a Building_a_Personal_Knowledge_Recommenda (20)

Mais de Vinícios Pereira

Mais de Vinícios Pereira (10)

Building_a_Personal_Knowledge_Recommenda