Lookingforwardenglish

cenidet
V O L . 1 3S U M M E R 2 0 0 6

Editorial
Dear Colleagues,
The Computer Society with approximately
100,000 members, is the leader organization for
the professionals in computer sciences. It was
founded in 1946, being the biggest of the societies
organized by the IEEE.
The Computer Society foments
international communication, cooperation and
exchange of information. It is dedicated to the
promotion of the computer theory, practice and
application to the data processing technology.
"Looking Forward" is the student written
and for The Computer Society students magazine.
After a great effort, we finally present the number
13 edition of the "Looking Forward" electronic
magazine.
Before continuing we want to appreciate
the help of the members of the Computer Student
Chapter of the CENIDET, who are very honored
of being elects to produce this edition.
The articles contained in this magazine
embrace diverse topics, all related with the
Master degree and Doctorate degree thesis
projects being developed at the Computer
Sciences Department. Particularly, in the
following lines of investigation: Software
Engineering, Distributed Systems and Artificial
Intelligence.
We appreciate the cooperation of M.S.
Andrea Magadán Salazar for coordinating all their
members.
This edition has been possible thanks to
the help of Prof. Ken Christensen, we
acknowledge his support and trust to the
Computer Student Chapter of the CENIDET.
We know that participating in this kind
of projects exalts the prestige of our institution
and our country.
We hope in a future this magazine can
be added to the IEEExplore as part of the
literature of the Computer Society.
Long life Looking Forward!!!
Jonathan Villanueva Tavira
Jonathan.villanueva@ieee.org
Director
Ken Christensen,
Associate Profesor,
Department of Computer Science
and Engineering ,
University of South Florida
Editor in Chief
M.C Andrea Magadán Salazar
Vicepresidenta del Capítulo de
Computación Sección Morelos.
Editors
Jonathan Villanueva Tavira
Rocio Vargas Arroyo
Jorge A. Saldaña García
Collaborators
Edgar Colorado Soto
Erika M. Nieto Ariza
Francisco Cervantes Álvarez
Hilda Solano Lira
Jorge A. Saldaña García
Juan C. Olivares Rojas
Luis E. Morán López
Maricela C. Bravo Contreras
Michelle Arandine Barrón Vivanco
Osslan O. Vergara Villegas
Rafael I. Ponce Medellín
Rocío Vargas Arroyo
Salvador Cervantes Álvarez
Vianey G. Cruz Sánchez
Student Branch Chapter CENIDET

The Center offers the possibility to
carry out investigation works in agreement
with other institutions like the Institute of
Electric Investigations and the Center of
Investigation in Energy; besides, there are
agreements with important universities and
foreign institutes.
CENIDET has the departments of
Electronic, Mechanics, Mechatronic and
Computer Sciences, headquarters of the
Computer Student Chapter of the IEEE.
The National Center of Research and
Technological Development, CENIDET, is
located in Cuernavaca, Morelos, city that has
been transformed into headquarters of several
scientific institutions, each dedicated to
research and education, allowing a profitable
exchange among them.
Cuernavaca benefits from its
proximity to the Mexico city, since the
researching professors and the students can
easily move to participate or to attend events,
to exchange information, to consult libraries,
to receive consultantships and,in general, to
be related with all the activities that
propitiate and enlarge the knowledge and
the creativity.
The Center, naturally, participates of
this valuable synergy. The CENIDET offers its
postgraduate programs, for related graduate
people that are interested in getting ready for
the applied investigation and the technological
development.
Welcome to CENIDET
Computer Sciences Department

Web Page Retrieval Using an Ontology
that is Populated by Automatic Text
Classification
21.
Summary
1. Editorial.
2.Welcome to CENIDET.
Web Page Classification:
a Semantic Análisis.7.
Digital Image Processing in
Wavelet Domain.13.
Evaluation of tools for business
process in three levels of
abstraction.
27.
Image Segmentation
Script Language.
GAP: A Tool to Solve the Problem of the
Web Contents Visualization in Pocket PC
Devices.
17. 31.

Summary
Methodology for the generation
of 3D models.41.
Neuro-Symbolic Hybrid Systems.
35.
Vanishing points detection using
Thales's theorem.45.
Segmentation by color to
characterize human gait.55.
An Ontology-based Translator
for Communicating Negotiation
Agents over Internet.
49.
59.
Description of two Statistical Models Applied
To The Extraction Of Facial Features: Integral
projections And Active Shape Model.
63. Authors.

7
Web Page Classification: a Semantic Analysis
Rocío Vargas Arroyo, Azucena Montes Rendón
Centro Nacional de Investigación y Desarrollo Tecnológico
{rvargas04c,amr}@cenidet.edu.mx
Abstract
In this paper, a semantic analysis for Web page
classification is presented. A set of Web pages,
resulting from a simple query to a Web browser, is
categorized by disambiguating the meaning of the term
used for the search. The disambiguation process
begins with the isolation of some outstanding
paragraphs; linguistic markers are used to accomplish
this task. The search term is located within the
paragraphs and the Contextual Exploration Method is
used to identify words that lead to the discovery of
relationships within an Ontology. Finally, the
discovered relationships are used for assigning the
web page to a category.
1. Introduction
Natural Language Processing is a challenging task
of Artificial Intelligence, because dealing with
language is not a simple task. With the immeasurable
growing of the Web, the problem of getting the proper
and desired information has grown too. Several
research groups have got significant and slightly
sufficient results, but not good enough to solve the
general needs. These obtained results are sensitive to
the ambiguity problem caused, mainly, by the used
techniques thus most of the developed projects [1-9]
are based on statistical methods and ignore linguistic
techniques.
In this work our main intention is the creation of a
classification tool. This tool will be able to classify a
set of HTML web pages written in Spanish. Linguistic
markers, Ontology and Contextual Exploration Method
[10] are integrated to accomplish the task.
2. Linguistic markers
In order to emphasize certain ideas contained in a
text, the author uses discourse markers. These markers
are linguistic units that set the order of the discourse.
Martín Zorraquino and Portolés [11] define these
markers as:
“Unidades lingüísticas invariables que no ejercen
una función sintáctica en el marco de la predicación
oracional – son pues, elementos marginales – y poseen
un cometido coincidente en el discurso: el de guiar, de
acuerdo con sus propiedades morfosintácticas,
semánticas y pragmáticas, las inferencias que se
realizan en la comunicación” [11].
Prada [12] extracted, from Martín Zorraquino and
Portolés [11] extensive analysis of these markers, five
categories (see table1).
Table1. Discourse markers
Category Sub-category
Comentadores
Ordenadores
Marcadores
estructuradores de la
información Disgresores
Aditivos
Consecutivos
Marcadores
conectivos
Contraargumentativos
Explicativos
De rectificación
De distanciamiento
Reformuladores
Recapitulativos
De refuerzo argumentativoOperadores
argumentativos De concreción
De modalidad epistémica
De modalidad deóntica
Enfocadotes de la
alteridad
Marcadores
conversacionales
Metadiscursivos
Each type of marker has especial use in Spanish
discourse, but for this project, the attention is focused
on recapitulative markers because they introduce a
recapitulation or conclusion of what was said, they
concentrate a general view of the text intention and let
the reader, reach the final thought of the author.

8
Examples of these recapitulative markers are: En
suma, en conclusión, en definitiva, en fin, al fin y al
cabo, resumiendo, finalmente among others.
3. Contextual Exploration Method
The Contextual Exploration Method (CEM) [10]
was developed by LaLIC team of the Paris-Sorbonne
University and directed by Jean Pierre Desclés. This
method provides a frame for the identification of
semantic information in a text and a set of mechanisms
that help in the resolution of semantic indeterminations
or ambiguity cases. It also assumes that the texts
include linguistic units that help in the task of sense
construction.
Is required, by the method, the description of
indices and indicators. Indicators are linguistic units
associated to a set of contextual exploration rules.
When an indicator is located within the text a rule is
triggered; the context of the indicator is explored in
order to identify indices or words that lead to the real
meaning of the indicators. Indicators, indices and rules
integrate the linguistic knowledge base.
4. Ontology interaction
For this project, a Spanish version of EuroWordNet
[13] is used. EuroWordNet is a lexical-semantical
ontology of many languages such as Spanish, Dutch,
Italian, Frech, German and Czech. It follows the
WordNet model but integrates some improvements as
better expressivity of the knowledge base; adds new
relationship definitions and transcategorial
relationships.
The version used in this project was transformed
into a database and accessed through SQL queries.
The result of the queries leads to the construction of
context exploration rules.
Figure 1. Graphical result of a query to the Ontology.
Search term capa.
5. Semantic analyzer
The process begins with a query to a well known
and widely used web browser. Simple search terms are
used (simple terms are more sensitive to ambiguity
problem) and a set of hyperlinks is retrieved. This set
is filtered to get only HTML pages written in Spanish.
Once the hyperlinks are extracted, each page is loaded
and its content is analyzed to get the most important
paragraphs; this paragraph selection is made by
locating linguistic markers within the text. The search
term is located within the isolated paragraphs the left
and right context are evaluated in order to get
surrounding indices that lead to the discovery of
relationships within an ontology. The extracted
relationships allow the association of the HTML web
page to a category.
The full process is graphically represented in figure
2.
Figure 2. HTML web page classification scheme.
5.1. Web browser query and hyperlink
extraction
The classification process begins with a query to a
web browser. The search term is a simple term, a
single word. The web browser returns a set of
hyperlinks linked to web pages where the search term
is contained.
Hyperlinks are extracted and its associated page is
analyzed later.
Several query strings where analyzed in order to
simplify the query to the web browser. Language and
file format is specified. Example:

9
http://www.google.com.mx/search?num=<number of
resulting links>&as_epq=<search term>&lr=
<language> &as_filetype=html
Two sets of hyperlinks are extracted, the first set
corresponds to the direct link to the web page, the
second, corresponds to the cache version of the page.
See figure 3.
Figure 3. Simple interface for hyperlink extraction.
5.2. Localization of linguistic markers and
paragraph selection
Each web page is loaded and its body content and
some other representative information as metadata are
extracted. See figure 4.
Figure 4. Simple interface for web page content
extraction.
Some linguistic markers, mostly recapitulative
markers, are located within the body of the page.
Paragraphs where these markers are present are
selected for a more extensive analysis. Consider the
next text segment and the search term “capa”:
“…El segundo sistema permite una elección mas
amplia del copulador y del revelador, pero implica
mayores dificultades en el tratamiento. El revelador
que tiene el copulador debe actuar en la capa
correcta, y esto se logra controlando la velocidad de
difusión o aplicando exposiciones controladas. El
método de la difusión controlada es anticuado. El
método de exposición selectiva utiliza una película de
varias capas que tiene la siguiente construcción: sobre
la base se aplica la emulsión sensible al rojo y encima
una emulsión sensible al verde; sobre esta, una capa
de gelatina que contiene un colorante filtro amarillo, y
finalmente sobre la capa filtro se aplica una capa de
emulsión sensible al azul…” [14]
From this text segment, a representative paragraph is
extracted:
“El método de exposición selectiva utiliza una película
de varias capas que tiene la siguiente construcción:
sobre la base se aplica la emulsión sensible al rojo y
encima una emulsión sensible al verde; sobre esta, una
capa de gelatina que contiene un colorante filtro
amarillo, y finalmente sobre la capa filtro se aplica
una capa de emulsión sensible al azul”. [14]
5.3. Search term localization and context
exploration
Once the paragraphs are selected, the search term is
located within each paragraph an its left and right
context are explored looking for key terms that lead to
the discovery of a relationship within an ontology.
The Contextual Exploration Method [10] is applied
and indicators, indices and rules are defined.
The indicators sets are composed by the search term
and its synonyms, the set of indices is populated by a
query to an Ontology and the rules are dynamically
created. Example:
Indicators = {capa, mano, baño}
Indices = {pintura, emulsión, impermeabilizante,
esmalte, barniz, …}
Rule:
If In the context of the I1 set appears any
indice belonging to set I2
Then
Assing the “Cubrimiento aplicado a una
superficie” meaning to the search term in the
selected statement
End if
“El método de exposición selectiva utiliza una
película de varias capas que tiene la siguiente
construcción: sobre la base se aplica la emulsión
sensible al rojo y encima una emulsión sensible al
verde; sobre esta, una capa de gelatina que contiene
un colorante filtro amarillo, y finalmente sobre la
capa filtro se aplica una capa de emulsión sensible al
azul” .

10
5.4. Relationships extraction and final
categorization
For the final categorization, a set of categories must
be defined. A first approach for solving this task is to
extract a group or relationships, from the ontology,
where the search term and the given sense are involved.
Example:
Figure 5. Homonym relationships of the search term
capa.
All nodes presented in figure 5 are homonyms of the
term capa, each node has a different meaning. Nodes 0
means “Acción y efecto de cubrir la superficie de un
cuerpo con un material” and node 3 means
“Cubrimiento aplicado a una superficie”. These
relationships where discovered in the Ontology, but
these are not sufficient for creating a category, so
second level relationships must be discovered.
Figure 6. Second level relationships of the search term
capa.
In figure 6, the hyponym relationships (for nodes 0
and 3) of the term capa are shown. Here is where
indice terms appear and let us create the categories for
the classification.
Therefore, a first set of categories names might be
formed by first level relationships and its meaning.
Here is where a big problem arises, the ontology data is
not complete, some of its meanings are missing.
6. Conclusions
The easiest way to get information from the web is
trough web browsers and directories; however, the
obtained results not always are sufficient enough
because the search techniques do not care about the
semantic content of the pages. So is necessary the
implementation of a tool able to make a proper
classification of the web pages according the real
meaning of the query.
With a tool like this, the search process made by the
user will be improved cause documents out of the
interest might be discriminated and as a consequence,
the number of pages to visit and the time inverted in
exploring not relevant ones will be diminished.
Moreover, the introduction of linguistic techniques
to classification methods might improve the way web
spiders indexes the information.
7. References
[1] A. Gulli and P. Ferragina, “The anatomy of a hierarchical
clustering engine for web-page, news and book snippets”,
Fourth IEEE International Conference on Data Mining,
ICDM’04, Brighton, UK, 2004.
[2] A. Gulli, “SnakeT”, http:// www.snaket.com, Pisa
University, Italy.
[3] Vivísimo, Inc., “How the Vivísimo clustering engine
works, http://www.vivisimo.com , 2003.
[4] Vivísimo, Inc.,”Clusty”, http://www.clusty.com , 2005.
[5] A. Téllez-Valero, M. Montes-y-Gómez and L. Villaseñor-
Pineda, “Aplicando la clasificación de texto en la extracción
de información”, Encuentro Internacional de Ciencias
de la Computación, ENC, Colima, México, September 2004.
[6] J. M. Gómez, E. Puertas, G. Carrero, M. de Buenaga,
“Categorización de texto sensible al coste para filtrado en
Internet”, Procesamiento del Lenguaje Natural, SEPLN,
Magazine nº 31, September 2003.
[7] L. Golub, Automated subject classification of textual Web
pages, for browsing, Thesis for the degree of Licenciate in
Philosophy, Lund University, Switzerland, August 2005.
[8] F. Sebastiani, “Machine learning in automated text
categorization”, ACM computing surveys, 2002.
[9] F. Sebastiani, “Text categorization”, in Zanasi A., Text
Mining and its Applications, WIT Press, Southampton, UK,
2005.

11
[10] J. P. Desclés, E. Cartier, A. Jackiewicz and J. L. Minel,
“Textual Processing and Contextual Exploration Method”,
Context 97, Rio de Janeiro, February 1997.
[11] Martín Zorraquino, Mª Antonia y José Portolés Lázaro.
Los marcadores del discurso. Bosque y Demonte. Vol. 3.
4051-4213.
[12] J. Prada, G. Moncecchi, “Reconocimiento eficiente de
marcadores del discurso en español”, VIII Simposio
Internacional de Comunicación Social, Santiago de Cuba,
Cuba, January 2003.
[13] Amsterdam University, “EuroWordNet: Building a
multilingual database with wordnets for several European
languages”, March 2006,
http://www.illc.uva.nl/EuroWordNet/
[14] “Fotografía / Fotografía en colores”, March 2006,
http://www.punksunidos.com.ar/punksunidas/fotografia/foto
grafia6.html

13
Digital Image Processing in Wavelet Domain
Osslan Osiris Vergara Villegas and Raúl Pinto Elías
Centro Nacional de Investigación y DesarrolloTecnológico (cenidet)
Interior Internado Palmira S/N, Col. Palmira. C.P. 62490.
Cuernavaca Morelos México.
{osslan, rpinto}@cenidet.edu.mx
Abstract
In this paper we present some explanation about
digital image processing in the wavelet domain. First
an image is transformed using a Discrete Wavelet
Transform (DWT), then several mathematical
operations are applied in order to observe some
features presented in the image. The transformation
can reveal some features that are not clear or difficult
to detect in the original domain. We obtain wavelet
directionality and edge detection, image smoothing,
scale changing, image denoising and compression, and
finally, feature extraction in wavelet domain.
1. Introduction
Wavelet transform was used in geophysics in early
1980s for the analysis of seismic signals [1]. A wavelet
transform decomposes a signal f into its components on
different scales or frequency bands. This is made by a
convolution process on f using the translated and
dilated wavelet ψ (wavelet mother). Based on the
selection of ψ, the transformed function allows, for
example, the extraction of the discontinuities or edges
in f, performing a pattern recognition task or storing a
compressed version of f.
Wavelets are signals which are local in time and
generally have an irregular shape. A signal can be
decomposed into many shifted and scaled
representations of the original mother wavelet [2].
Wavelets have the advantage of being able to
separate the fine details in a signal, very small wavelets
can be used to isolate very fine details in a signal, while
very large wavelets can identify coarse details. In
addition, there are many different wavelets to choose
from. One particular wavelet may generate a more
sparse representation of a signal than another, so
different kinds of wavelets must be examined to see
which is most suited for the application you need in
digital image processing for example image
compression or denoising, pattern recognition, etc.
2. Multiresolution Analysis
Multiresoltion analysis is concerned with the study
of signals or processes represented at different
resolutions and developing an efficient mechanism to
change from to one resolution to another [3]. The
discrete Wavelet Transform (DWT) is a mathematical
tool for the analysis and synthesis of signals that can be
used when digital images need to be viewed or
processed at multiple resolutions.
The localization of signal characteristics in spatial
(or time) and frequency domains can be accomplished
very efficiently using wavelets. This allows us to
simultaneously determine sharp transitions in the
spectrum of the signal and in the position (or time) of
their occurrence.
The principle behind the use of wavelets for
decomposing an image is to use a wavelet function Ψ to
represent the higher frequencies corresponding to the
detailed parts of the image, and a scaling function Φ to
represent the lower frequencies corresponding to
smooth parts of the image. Figure 1 shows the process
to decompose an image using filters to obtain the
subband decomposition.
Figure 1. Subband decomposition of an image.

14
Multiresolution analysis plays an important role in
the perception and decision mechanism of human
beings.
3. Wavelet Directionality and Edge
Detection
With the wavelet transform we can obtain some
information (details) about the directionality and the
edges presented in an image. In this section we use the
image shown in figure 2 to explain how to obtain
directionality and edge detection.
Figure 2. Image “Star” for edge and
directionality analysis.
The first thing to do is to transform the original
image using some DWT, for this example we
decompose the image only one level using the symlet 4.
Figure 3 shows the resulting image from wavelet
transform.
Figure 3. “Star” after wavelet decomposition.
From figure 3 we can see that the horizontal edge of
the original image is presented in the horizontal detail
coefficient matrix of the upper-right quadrant.
Similarly the vertical edge is shown in the vertical
detail coefficients of the lower-left quadrant. Finally
you can see that the diagonal borders are shown in the
diagonal detail coefficients of the lower-right quadrant.
From the wavelet subband image we can combine
the edge information into a single image. The only
thing that we need to do is changing to zero all the
approximation coefficients (upper-left quadrant), and
then compute de Inverse Discrete Wavelet Transform
(IDWT). The resulting edge image is shown in figure
4.
Figure 4. “Star” edge resulting image.
We can use a similar procedure to isolate the
vertical or horizontal edges. It is important to remark
that the information of diagonal edges is always
preserved although we cut the diagonal coefficients.
4. Image Smoothing or Blurring
Wavelets can be used as a tool for blurring or
smoothing images. For this example we use the image
shown in figure 5.
Figure 5. “Tools” image.
In order to smoothing, we need to compute the
wavelet transform in more than one decomposition
level; for example we use the Symlet 4 with four
decomposition levels as shown in figure 6.
Figure 6. Four decomposition level of “Tools”.
Converting to zero one detail level allows us to
obtain increasingly smoothed versions of the original
image.

15
For example if you zeroed the first three details
level, we can obtain an image as the one shown in
figure 7a). If we change to zero all levels (four) we can
obtain an increase of blurring in the image as shown in
figure 7b.
Figure 7. “Tools” images. a) Result of zeroing
first three levels, b) result of zeroing all levels.
5. Change the Scale
Changing the scale of an image implies a
mathematical operation called interpolation. With the
DWT we can downscale an image with a factor of 2n
or
upscale an image with a factor of 2n
. This process is
made by removing or adding subbands in the wavelet
domain.
This scale changing provides an application to
progressive reconstruction of the image. Lets suppose
that we need to send an image to two users in different
sites and those users need different resolution images.
The image can be reconstructed with gradually
higher resolution approximations of the final high
resolution image, and we can send the adequate image
for each user at an exact subband reconstruction.
Figure 8 shows an upscaling example of the camman
image.
Figure 8. Upscaled Camman image. a) Original
image, b) Incise a doubled image, c) Incise b
doubled image.
Performing of the upscaling or downscaling process
is better than the same process made by interpolation.
6. Image Denoising
Image denoising is the process of separating the
noise out of the image components from a single
observation of a degraded image. The image can be
corrupted with noise because of either data acquisition
process or naturally occurring phenomena.
The simplest technique for denoising is wavelet
thresholding (shrinkage). We use as input a noise signal
like the image that we shown in figure 9a. We perform
the wavelet transform using for example four
multiresolution levels, and then we use a denoising
method called soft thresholding through all subbands.
The technique sets coefficients with values under
the threshold (T) to 0, then substracts T from the non-
zero coefficients. After soft thresholding, we compute
the inverse wavelet transform. Figure 9b shows the
image obtained from the denoising process.
Figure 9. “Goldhill” image. a) Gaussian noise
image, b) denoised image
The main problem in denoising is the selection of
the best value of T for thresholding.
7. Image Compression
One of the main popular applications of wavelets is
image compression. Data compression goal is to reduce
the volume of necessary data to represent a certain
information amount. One of the advantages obtained
with the use of DWT is that the image is decorrelated,
thus the image can be easily compressed.
Some of the wavelet coefficients obtained from
DWT correspond to details in the data set. If there are
few details, they might be omitted without substantially
affecting the main features of the data set.
The first step is to transform the original image into
the wavelet domain using the DWT, one of the
important decisions is what family of wavelet to use
and what multiresolution level to apply. The selection
of a wavelet family depends a lot on the subsequent use
of the image, but it is necessary to take into account
some wavelet properties as orthogonality, number of

16
vanishing moments, compact support, symmetry, etc.
By the other hand, the multiresoltion level can not be
larger than Log2(N).
The quantization stage is made after the process of
DWT; here, we can use two different strategies. One is
to set all high frequency sub-band coefficients that are
under a particular threshold to zero. The other is to
change to zero, for example, the coefficients behind the
matrix diagonal or some decomposition level.
There are two highly used techniques for
quantization: the Embedded zerotree wavelet coder
(EZW) and the Set Partitioning in Hierachical trees
(SPIHT) which are very efficient for several
applications.
The final stage corresponds to the entropy coder
which is a lossless stage. Figure 10 shows the lena
image and from left to right different images obtained
from compression process at different quality and
storage space.
Figure 10. “Lena” image at different quality
and storage space.
8. Image Feature Extraction
Image classification is maybe the most important
application when using digital images. In order to
perform it, a feature vector is used to describe an
image.
The statistical properties of the wavelet coefficient
characterize an image, which can be used to lead us to
the better image classification. Some measures
obtained from wavelet coefficients are:
Norm-2 energy:
∑=
=
N
k
kC
N
E
1
2
21
1
(1)
Norm-1 energy:
∑=
=
N
k
kC
N
E
1
12
1
(2)
Standard deviation:
∑=
−=
N
k
kC
N
E
1
2
23 )(
1
µ (3)
Average residual:
∑=
−=
N
k
kCE
1
2
4 )( µ (4)
Entropy:
∑=
−=
N
k
kk CC
N
E
1
22
25 log
1
(5)
Where
∑=
=
N
k
kC
N 1
2
1
µ (6)
µ is the mean and N the size of the image.
9. Conclusions
In this paper we show some applications of the
wavelet transform for digital image processing, with
the goal of demonstrating that an image can be
manipulated even in the wavelet domain.
Research in wavelets keeps looking for some more
complex families fitting with a particular application,
for example, trying to describe an important feature of
the image known as image geometry.
10. References
[1] Morlet, J., G. Arens, E. Fourgeau, and D. Giard, “Wave
propogation and sampling theory part 1: Complex signal and
scattering in multilayered media”, Geophysics, Vol. 47, No.
2, pp. 203 -221, February 1982.
[2] Maryhelen S., “Image compression using wavelets”,
Thesis proposal, Department of electrical and computer
engineering, University of New Brunswick, Canada, 1997.
[3] Mallat, S., “A theory for multiresolution signal
decomposition: The wavelet representation”, IEEE
Transactions on Pattern Analysis and Machine Intelligence
(PAMI), Vol. 11, No. 7, pp. 674 – 693, July 1989.
[4] Gonzalez Rafael C., Woods Richard E. and Eddins
Steven L., Digital image processing using Matlab,
Pearson Prentice Hall, 2004.

17
GAP: A Tool to Solve the Problem of the Web Contents Visualization in
Pocket PC Devices.
J. Carlos Olivares R., J. Gabriel González S., Azucena Montes R., Víctor J. Sosa S. e I. Rafael
Ponce M.
Centro Nacional de Investigación y Desarrollo Tecnológico(cenidet)
Cuernavaca, Morelos, México
{jcolivares04c, gabriel, amr, vjsosa, rafaxzero4c}@cenidet.edu.mx
Abstract
This tool intends to fill the existing ‘GAP’ in the
Web sites visualization in mobile devices, such as
Pocket PC. In order to guarantee that the users can
correctly visualize the Web resources, two things are
needed: a mechanism for controlling disconnections,
and allowing visualization of Web content despite of
the device connection state (hoarding), and a
mechanism that can adapt the Web content to the
specific mobile device features (transcoding). GAP is a
tool that integrates these two mechanisms and allows
improving of the user’s navigation experience in the
Mobile Web.
Keywords: Pocket PC, Visualization, Web Resources,
Hoarding, Transcoding.
1. Introduction
Mobile devices are each time closer in time,
according with [1]: "By 2009, more than a half of the
microprocessors made in the world will be intended for
mobile devices." "The software that will really make
mobile devices useful isn’t developed yet." These
statistics reflect that the use of mobile devices is
increasing due to their tiny size and that its power of
processing and versatility is growing day by day.
The problem of Web resources visualization in
mobile devices is the fact that the great majority of
Web sites in Internet have not been designed for this
type of devices. The mobile devices have limited
resources like small screens, little memory, low
processing speeds, etc; in comparison with traditional
computers equipment.
On other hand, the Web and the protocol that
manages it: HTTP are connection oriented (they are
based on TCP) what causes the transaction to fail if
the user, by any reason, becomes disconnected from
the network . In this case, it might not be possible to
visualize the Web resources in the mobile client.
Disconnections are frequent in this type of devices,
mainly because of their main advantage: mobility.
In this work a system which development is in
progress is described. It focuses in attacking the
problem of Web resources visualization on mobile
devices. The main characteristic of this work is that
great part of the system is executed in this kind of
devices, in comparison to the great majority of the
existing solutions that are executed in traditional
platforms.
2. Alternatives of solution
In order to solve this problem several alternatives
are presented: to design a new protocol, to modify and
existed protocol or to implement intermediary services
that solve the problem.
2.1 New protocols
In this scheme is possible to mention the WAP
protocol and the WML language, they work in an
analogous way as HTTP-HTML in the traditional Web.
The problem strives in that WAP only works with
mobile equipment and this would bring the same
fragmentation that today has the Web (special pages
for all class of devices). In addition, WAP was
originally designed for devices with limited resources
capacities (monochrome screens, lower bandwidth, etc)
which is actually solving day by bay through
bandwidth wireless connection (WCDMA, UTMS,
802.11g, WiMax, etc) and with more and more
powerful equipment.
The best solution would be to create a new protocol.
The problem is that this one must be totally compatible
with the existing ones, because if not, it would let
unusable thousands of existing resources (it would be
necessary to modify as much Web servers as Web
clients).
2.2 Modification of protocols
Within this alternative exits the case of having a
new request scheme of Web resources. This new

18
scheme receives the name of Push, whereas traditional
scheme receives the name of Pull [2].
The Pull scheme receives the name of “over
demand’. Under this scheme, the client (user) is who
visualizes a resource in an explicit way. In our case, if
a user wants to see the page of cenidet, must write in
the Web browser the next URL:
http://www.cenidet.edu.mx/.
The Push scheme also receives the name of
'subscription-notification'. In this scheme, the user
subscribes itself to a service and when some event of
interest happens a notification is sent for alerting the
user about the event.
Generally these two schemes do not live on isolated
way. Hybrid schemes (Pull&Push) have been applied
in diverse existing services, so is the case of the
reception of SMS/MMS messages, where the send of
messages is Pull and the reception is Push, since it
notifies to users about the existence of new messages.
Another service that has made famous devices like
the Blackberry to become successful is the Push-mail
[3]. This service comes to solve the problem of email
visualization in mobile environments. Under the
traditional scheme of the electronic mail, for consulting
the email, a user must be connected all the time to
receive it. This originates great costs if the network
connection generates costs per time. With this new
scheme, the user is not connected to the mail server.
When a new mail in the server is received, it notifies
the client of the existence of the new mail and sends it
to the mobile client.
For this type of schemes, protocols like HTTPU
(HTTP over UDP) or HTTPMU (HTTP over multicast
UDP) have been proposed, and basically works similar
to the HTTP but using datagrams, which are not in an
oriented connection way. With these protocols are
possible to offer a better quality in the mobile Web [4].
2.3 Intermediary services
This is the more extended solution to solve the
problem of Web resources visualization and many
other problems present on Web, like the case of
firewalls that solve some of the Web security problems
like the access control, or proxies’ caches that tries to
reduce the access latency to the information.
The scheme of intermediaries is widely used
because it doesn’t need to modify neither the clients
nor the servers; in fact, the client and server processes
do not notice the existence of these intermediary
services. These services are in charge of the hard work
and are transparent to the users.
The tool that is described in this article, works
under the scheme of intermediary services.
3. Proposal of solution
The hoarding process solves the problem of Web
resources visualization without concerning the state of
the connection of the mobile device. For this, it
becomes necessary that the user has already stored, in
local way, in his device the resources that he o she will
use.
As can be observed, the amount of resources to
occupy can be immense, whereas the capacity of
storage of the devices is limited. In order to give
solution against this new problem is necessary to have
an effective way to know the resources that a user
could use. With hoarding is possible to reduce this,
through algorithms of association rules applied on Web
logs, is determined the optimal set of resources that
will be replicated to the mobile clients [5].
A mechanism which tries to solve the adaptation
problem of Web resources to the displaying capacities
on mobile devices is transcoding. It consists of
transformation of resources, distilling and processing
of all those characteristics that are not available in the
device is needed. The used mechanism of transcoding
uses HTML to a subgroup of HTML transformer,
using XML.
The system is based on client-server architecture
with an intermediate tier on the server side as on the
client side. The system is shown in Figure 1.
Figure 1. General architecture proposed.
The general system has been denominated GASWT
(Gestor de Acaparamiento de Sitios Web
Transcodificados: Hoarding Manager of Transcoding
Web Sites). The intermediary in the client side is
denominated GAP (Gestor de Acaparamiento para
Pocket PC: Hoarding Manager for Pocket PC),
whereas the server side is denominated GAT (Gestor
de Acaparamiento y Transcodificación, Hoarding
Manager and Transcoding). The GAT is composed by
MA (Mecanismo Acaparador: Hoarding Mechanism)
and by MT (Mecanismo Transformador: Transcoding

19
Mechanism). The communication between the
processes is made through a HTTP request-response
scheme.
As much the MA as TM are taken from other
projects that together with this one, comprise the
Moviware project [6], whose main function is to offer
a set of services to mobile clients that have frequent
disconnections.
The general operation of the system is described in
the next lines. The user introduces an URL from the
Web browser (which has been previously configured to
redirect his exit towards the GAP). The GAP receives
the request and determines if it is in the local cache of
the device, if found, the hoarded resource is sends to
the Web browser.
When the resource is not hoarded, the system
validates the connection existence in order to obtain
the resource on line. If for some reason the resource
cannot be shown, (because it doesn’t exist or has
detected an error in the connection) the system notifies
the user by sending an error message.
On the other hand, if the Web resource is not
hoarded and a pattern of the site in the local device
doesn’t exist, the MA sends the Web resources if a
pattern for this site exists. If the pattern exists but the
hoarded resources in the MA aren’t present, it obtains
them by requesting them to MT and soon compresses
the resources in zip format to optimize the process.
Once the MA has sent the hoarded Web site, the
mobile device must decompress the Web site and
update its list of patterns. This process happens in
transparent way, in a way that the user never notices.
MT is responsible of collecting documents and if
they are HTML, it transforms them if the configuration
parameters indicate that. The transcoding is made on
line, because the process is slowed down if the
document is too large.
The actions that the user can make on the system
consist in visualizing Web sites on line, visualizing
Web sites on disconnection mode, visualizing error
messages, visualization of the requests states and
finally, set up the system.
The GAP is basically conformed of three main
modules which are: Observer, GAL (Gestor de
Acaparamiento Local: Local Hoarding Manager) and
GDL (Gestor de Desconexión Local: Manager of Local
Disconnection).
The Observer is responsible of processing each
request and to give back the result to the navigator.
The GAL is responsible of the manipulation and
control of the cache in the device. The users decide
which resources are susceptible of hoarding, as well as
limiting the storage space.
The GDL is responsible of determining the state of
the connection. The control of the disconnections has
been used drilling the network during three seconds.
Observing the quality of the results, a threshold of 30%
of accepted connections determines if the client is
connected (if the threshold is surpassed or equaled) or
is on disconnection mode (if it is below the threshold)
[7].
For the implementation of this tool, we used .NET
Compact Framework 1.0 with C # language, because it
is the best option to program in Pocket PC platform
[8].
The modifications of the MA and MT are being
made in Java so that it is language in which these
modules are programmed.
4. Results
The tool described in the present document has been
proven in diverse equipment like Pocket PC 2000
(Compaq iPAQ H3630), Pocket PC 2002 (HP Jornada
5500), Pocket PC 2003 (HP rx3115), emulators of
Windows CE, desktop PC (Compaq Presario with
Pentium 4 1.4 Ghz. processor, 512 Mb of RAM
memory).
The first test scenario consisted of acceding to the
Web resources in on line mode. We obtained
satisfactory results (see Figure 2).
In the number two test scenario, the GAP was
executed without being connected to the network.
Additionally we had a pattern of a hoarded Web site
(http://www.cenidet.edu.mx/) and resources. In this
case not existing images in the original site were used,
because it was possible to verify that the hoarded
resources are correctly displayed.
The number three test scenario (see Figure 3),
demonstrates that it is possible to transcoding the
resources in the device as well as showing them in a
local way if they are hoarded and without transcoding.
It is Also possible to execute the GAP in other
platforms like Smartphones (SmartGAP) and a desktop
PC (WinGAP). GAP, WinGAP and SmartGAP are the
same program but with different name, to differentiate
the platforms in which they’re running.
5 Conclusions
With the presented tool is being demonstrated that it
is possible to execute complex services in Pocket PC
devices, so is the case of an intermediary service that it
allows to visualize Web resources when it exists or not
a network connection.
At this time we have verified in an isolated way
most of the functions of the system (it lacks the
methods of decompression of the hoarded site), it

20
would be necessary the respective integration of
components and testing to the system in its totality.
Figure 2. Case of test 1: Visualization of Web
resources with network connection.
Figure 3. Visualization of Web sites in
disconnection mode with hoarded Web resources
and without transcoding.
Figure 4. Case of test 3: Visualization of Web sites
in connection mode, with hoarded and transcoding
resources.
The expected benefits at the conclusion of this
investigation work are: 1) Visualization of Web sites
without mattering if the devices are connected or not.
2) Reduction of latency in the access to the
information, if the resource is hoarded locally. 3)
Energy Saving by the fact to work in disconnection
mode. 4) Saving money if the user decides not to
connect to a network that receives the service and
generates expenses by the access time. 5) Facility of
administration of Web sites when not having different
versions to each device.
6. Acknowledgments
We want to give thanks to Rocío Vargas Arroyo for
her contribution in correct this paper.
7. References
[1] SG magazine, http://www.softwareguru.com.mx [visited
march 2006]
[2] Purushottam Kuikarni, et al., “Handling Client Mobility
and Intermittent Connectivity in Mobile Web Accesses”,
Department of Computer Science, University of
Massachussets.
[3] Blackberry’s push technology,
http://www.blackberry.com/products/software/integrations/p
ush_email.shtml [visited march 2006].
[4] UPnP Forum, http://www.upnp.org/, [visited march
2006]
[5] David Valenzuela, “Mecanismos para predicción de
acaparamiento de datos en sistemas clientes/servidor
móviles”, masther thesis, cenidet, august 2002.
[6] Gabriel González. “Plataforma middleware reflexiva para
aplicaciones de cómputo móvil en Internet (Movirware)”,
cenidet.
[7] J. Carlos Olivares, et al, “Control de desconexiones en la
visualización de páginas Web en dispositivos móviles
Windows CE”, for appear in XVI CIECE’06, april 5,6 and 7
2006, Cd. Obregón, Sonora, México.
[8] Gabriel González, Azucena Montes, J. Carlos Olivares,
“Comparativa y evaluación de las herramientas de
programación para desarrollar aplicaciones en plataforma
Pocket PC”. VI CICC’05, Colima, Colima, México,
september 2005.

21
Evaluation of tools for business process in three levels of abstraction
Erika M. Nieto Ariza1
, Javier Ortiz Hernández1
, Guillermo Rodríguez Ortiz2
1
Interior internado Palmira s/n, Cuernavaca, Morelos, 62490 México
{erika, ortiz}@cenidet.edu.mx,
2
Instituto de Investigaciones Eléctricas
Reforma 113. Palmira, Cuernavaca, Morelos, 62490 México
gro@iie.org.mx
Abstract
Organizations are increasingly choosing the use of
the web to provide their services to their clients.
Services are the systemization of the business
processes in the organization. Due to the great number
of existing modeling methods and the increasing use of
internet, it is necessary to identify the information that
modeling methods allow to specify. In this paper, a set
of concepts is proposed to evaluate modeling methods
for business modeling using three levels of abstraction
–organizational, integration and web.
1. Introduction
Organizations should decide how the technology
systems support business and how increasingly these
information systems become an integral part of the
business processes [1, 2]. Models are commonly used
to flexibly represent complex systems and to observe
the performance of a business process when a
technology system is integrated [3, 4, 5]. A business
model is an abstraction of how a business performs, it
provides a simplified view of the business structure
which acts as the basis for communication,
improvement, or innovation, and defines the
information systems requirements that are necessary to
support the business. A model has to capture the
domain without reference to a particular system
implementation or technology. One of the problems
with modeling the early representations of business
processes, conceptual views of information systems
and Web interactions is the great number of techniques
to model and specify these models, and, additionally,
since each one has its own elements, this makes it
complex and laborious to compare and select the
appropriate technique to model a system in an specific
level of representation.
Three modeling levels of abstraction are proposed
which integrate a set of concepts to build early web
application models: a) Organizational, it describes how
the organization works and the business process that is
going to be systematized with a web information
system; b) Integration, it describes the role of the
software system and its integration with a particular
organizational environment; c) Web, it describes the
semantics of a web application [5,6]. The basis of our
contribution is in the identification and classification of
a set of concepts which are used to know what to model
at each level of abstraction and, to have a modeling
method evaluation framework to distinguish the
capabilities of each method in order to model at the
three levels of abstraction.
There are some methods and methodologies to
evaluate business process modeling; however, they do
not evaluate capabilities but rather the functionality of
the application or the modeling methods. Rosemman
proposes ontology to evaluate organizational modeling
grammars identifying their strength and weaknesses
[7]. Luis Olsina [8] and Devanshu Dhyani [9], propose
a methodology to evaluate the characteristics of a web
application in operational phases.
The structure of this paper is as follows: in section 2
the modeling concepts that comprise our approach are
briefly presented, in section 3 the modeling concepts
are enhanced with a set of aspects found to be useful in
building models and a method evaluation methodology
is presented, in section 4 the results of the evaluation
are shown, in section 5 the conclusions about the
benefits of the methodology are discussed, finally the
references are presented.
2. Modeling concepts
A business process model can be viewed at many
levels of abstraction, and complementary model views
can be combined to give a more intelligible, accurate

22
view of a system to develop than a single model alone
[3]. This approach establishes three levels of
abstraction and each one includes certain modeling
concepts of features as shown in table 1. At each of
these levels, concepts are properties or characteristics
that structurally describe types of requirements in a
specific level of abstraction; they define the key
elements in a business process. Concepts in each level
of abstraction were selected based on the analysis of
several techniques and methods for business process
modeling at the three levels.
Table 1: Modeling concepts at each level of abstraction
Organizationa
l level
Integration
level
Web level
Business process Pure navigation
--- Navigation page
- Relationship
User profile (Rol) User profile (Rol)
Actor Actor
Class (object) ---
Resource Artifact
Artifact Artifact
Goal Goal --- Goal
Task Function Service Service
Activity Event
Event ---
Business rule Constraint Precondition and
postcondition
---
Quality No functional
requirement
No functional
requirement
---
The organizational modeling concepts are as
follows.
- Actor. It describes an entity that has a specific goal in
the business process.
- Resource. It describes an informational or physical
entity that is transferred between actors.
- Goal. It describes a business process desired state that
an organization imposes to itself.
- Task. It describes a series of activities oriented to
reach a goal.
- Activity. It describes a set of actions to carry out one
task.
- Quality. It describes the desired characteristics in the
business process.
- Business rule. It describes the actions and criteria that
govern the execution of the business process.
The integration modeling concepts are as follows.
- Actor. It describes an entity that interacts with the
information system and that might play different roles.
- Artifact. It describes an entity that is transferred
between an actor and the information system.
- Goal. It describes the information system purpose,
limitations and responsibilities.
- Function. It describes a service that must be provided
by the information system.
- Event. It describes a change in the business process in
one specific moment of time.
- Constraint. It describes a condition for a service
execution supplied by the information system.
- Non functional. It describes the desired quality
features or constraints for the information system.
The Web modeling concepts are as follows.
- Navigation relationship. It describes a global vision of
the Web application according to a user profile.
- User profile. It describes the user unique use of the
Web application.
- Class. It describes an object type to model the entities
that integrate the application.
- Artifact. It describes an abstract object to be
transferred between the Web application and a user.
- Goal. It describes the purpose of the Web application.
- Service. It describes an activity or an action that the
web application has.
- Event. It describes the trigger of an activity or action
that might be carried out to obtain a result or artifact.
- Pre and pos condition. It describes the performance of
an event execution.
- Non functional requirement. It describes the desired
quality features or constraints for the Web application.
Each concept used for business process modeling is
related to each other.
3. The concepts and the evaluation of
methods approach
The last section introduced a set of modeling
concepts used to model business processes and systems
at different levels of abstraction. Here the concepts are
enhanced with aspects that make them more powerful
to model a particular view. These aspects are also used
as scales to evaluate modeling methods. These aspects
are capabilities sorted by the concepts presented before
and a scale is defined for each concept using the
capabilities related to the concept. Also, a desired
capability mentioned in the literature may be used in
the definition of a scale.
Following a well-known approach from the
economics and management disciplines, to each aspect
a scale between 0 and 5 is assigned which is going to
be used to evaluate one of the modeling capabilities. As
in the statistics methods, the concepts in this paper are
qualitative variable with a nominal scale [10]. The
evaluation scale is obtained by first taking a list of the
capabilities of one method, and then a list of
capabilities from a second method, from a third, until
all selected methods are analyzed. The concepts
evaluation scales facilitate the comparison of different
modeling methods capabilities (see Tables 2, 3 and 4).
The order assigned to the scales is intuitive and
relatively arbitrary; however, it can be changed easily.

23
Then each one information method is evaluated for all
the aspects in each level of abstraction.
Table 2: Aspects and evaluation scales for the
organizational level of abstraction
Table 3: Aspects and evaluation scales for the integration
level of abstraction
The evaluation consists in assign a value to each
concept of the method. For example, the concept non
functional requirement at the web level; if the method
has the non functional requirement concept; the method
should have 1 point. If the method in the non functional
requirement concept says who proposes it and to what
is applied, the method should have 2. If the method has
the concept of non functional requirement, who
proposes it and to what is applied, and also, the kind of
requirement, the method should have 3 points. If the
method has the concept of non functional requirement,
who proposes it and to what is applied, the type of the
requirement, and also, the measure to verify
compliance; the method should have 4 points. The
method should have 5 points if it has the concept of
non functional requirement, who proposes it, to what is
applied, the type of the requirement, the measure to
verify compliance and what happens if it is not
fulfilled.
Table 4: Aspects and evaluation scales for the Web level
of abstraction
3.1. Evaluation methods
The evaluators have to evaluate the three levels of
abstraction for all concepts. For each modeling method
and for each aspect ai, a corresponding evaluation ei is
obtained. The results are displayed in a table for easy
comparison and a total score is obtained for each
method and for each level of abstraction as Σei. A
method that scores better than other, possibly has more
capabilities to model requirements at the corresponding
level of abstraction than the first.
4. Results of the methods evaluations
As an exercise, the following methods i*, Tropos,
EKD, BPM-UML, OO-Method/OOWS, OOWS [5, 7,
4, 8, 9, 11, and 12] were evaluated using the scales
presented (tables 5, 6 and 7). The methods evaluated at
each level are not the same since some methods do not
offer the modeling concepts for the level where they
are not shown.
Table 5: Organizational level evaluation of the methods
Organizational
level
Max.
Value
I* Tropos EKD BPM-
UML
Actor 5 5 5 5 5
Resource 5 5 5 2 5
Goal 5 1 3 4 3
Task 5 2 4 3 2
Activity 5 0 2 0 4
Business rule 5 2 0 5 4
Quality 5 3 4 4 4
Total 35 18 23 23 27
Scale
Concept
1 2 3 4 5
Actor Actor --- Role Type Responsibilit
y
Resource Resource Type Actor using
it
--- Actor
supplying it
Goal Goal Priority Problem Opportunit
y
Verification
Task Task Who
requests
Who
executes
Hierarchy Associated
Goal.
Activity Activity Tasks
supported
Hierarchy How is
activated
When is
concluded
Business rule Business
rule
Associated
concept
Origin Type Hierarchy
Quality Quality Associated
concept
--- Origin Measure
Scale
Concept
1 2 3 4 5
Actor Actor --- Role Type Responsibility
Artifact Artifact Actor or
function
supplying
--- Actor or
function
requiring
Artifact state
Goal Goal Who
establish it,
Associated
to a
function
Assigned
priority
Measure,
Failure
cause
Opportunity to
solve a
problem
Function Function Who starts
it
Who uses it Hierarchy The product
Event Event Who fires
it,
What is the
start state,
What is
produced,
Hierarchy
Who
receives the
product,
Owner
function
Final state
Constraint Constraint Type Who defines
it
To who or
what applies
Who or what
enforces it
Non
functional
requirement
Constraint Who
proposes it,
To what is
applied.
Type of
requirement.
Measure to
verify
compliance.
What happens
if not fulfilled.
Scale
Concept
1 2 3 4 5
Navigation
page -
Relationship
Navigation
page
Nav. page -
Relationship
User Profile Navigation
help
Access
constraints
User profile
(Role)
User profile Role Role
changes
allowed
Services
per user
Business
process
state
Class
(object)
Class
(object)
Attributes Relationships Methods Type of
relationships
Artifact Artifact --- Type Supplier User
Goal Who
defines it
Associated
service,
Priority Measure Failure
cause,
Opportunity
to solve it
Service Related
events
Hierarchy,
Requesting
User
Executing
agent,
Result.
Result final
user
Owner page
Event Event Service
owner,
Hierarchy,
Implementin
g class
Who
requests
Shared or
not
Pre and
post
condition
Post
condition
Pre
condition
--- --- Associated
event
Non
functional
requirement
Non
functional
requirement
Who
proposes it,
To what is
applied.
Type of
requirement.
Measure to
verify
compliance.
What
happens if
not fulfilled.

24
Table 6: Integration level evaluation of the methods
Integration
level
Max.
Value
I* Tropos EKD BPM-
UML
OO-
Method
Actor 5 5 5 5 5 1
Artifact 5 5 5 4 5 4
Goal 5 1 3 4 3 1
Function 5 2 2 5 5 2
Event 5 0 1 0 4 3
Constrain 5 2 0 5 4 5
No functional 5 3 4 4 4 0
Total 35 17 20 27 30 16
Table 7 (a): Web level evaluation of the methods (business
process)
Nivel web Max.
Value
Tropos OO-Method /
OOWS
OOWS
User profile 5 3 4 4
Class 5 0 5 5
Artifact 5 4 4 4
Service 5 3 3 3
Event 5 1 3 2
Precondition and
post condition
5 2 5 3
No functional 5 3 0 0
Total 35 16 24 21
Table 7 (b): Web level evaluation of the methods (pure
navigation)
Nivel web Max.
Value
Tropos OO-Method /
OOWS
OOWS
Navegational page –
relationship
5 1 5 5
User profile 5 3 4 4
Goal 5 3 0 0
Artifact 5 4 4 4
Service 5 3 3 3
Total 25 14 16 16
At organizational level, BPM-UML obtains good
scores for this level of abstraction, and i* has the
lowest score. The methods were evaluated with respect
to the parameters defined for the approach presented
here. During the evaluation of methods, their own
characteristics are shown, for example, the quality
aspects of a business process are modeled as qualitative
goals using BPM-UML. At integration level, the result
shows the capacities of each method, for example,
BPM-UML obtains good scores for this level, but OO-
Method has the lowest score.
5. Conclusions
There are many proposals to model the
organizational, integration and web requirements and
each one has its own elements. Some use the same
concepts but the names are different, which makes it
complex and laborious to compare the methods. The
approach presented here unifies the various
terminologies, increases the knowledge about modeling
concepts, and proposes an evaluation approach for the
methods modeling capabilities and techniques. This
helps to select the method that is more appropriate to
the needs of a problem domain. The approach has been
used to evaluate e-learning systems [13]. Additionally,
it has been applied in the development of various case
of studies to evaluate virtual reality methods and to
clearly appreciate the concepts that the methods allow
to model.
6. References
[1] James Pasley,: “How BPEKL and SOA are changing web
services development”, IEEE Internet Computing. May –
June 2005.
[2] Peter F. Green, Michael Rosemann y Marta Indulska,:
“Ontological Evaluation of Enterprise systems
Interoperability Using ebXML”, IEEE Transactions on
Knowledge and Data Engineering, Vol 17, No. 5, IEEE
Computer Society, may 2005.
[3] Mersevy T. and Fenstermacher K.,: “Transforming
software development: and MDA road map”, IEEE
Computer Society, September 2005.
[4] H. E. Eriksson and M. Penker, Bussiness process
modeling with UML, Chichester, UK, Wiley Editorial, 2000.
[5] E. Yu,: Modelling Strategic Relation for Process
Reengineering, Universidad de Toronto, Canada, 1995.
Thesis submitted for the degree of Doctor of Philosophy.
[6] A. Ginige and S. Murugesan,: “Web Engineering: An
Introduction” IEEE Multimedia, pp 1-5, Jan-Mar 2001.
[7] Peter F. Green, Michael Rosemann y Marta Indulska,
“Ontological Evaluation of Enterprise systems
Interoperability Using ebXML”, IEEE Transactions on
Knowledge and Data Engineering, Vol 17, No. 5, IEEE
Computer Society, may 2005.
[8] Olsina, Luis A., Metodología cuantitativa para la
evaluación y comparación de la calidad de sitios web. Tesis
doctoral. Facultad de Ciencias Exactas, Universidad
Nacional de La Plata, noviembre de 1999.
[9] Devanshu Dhyani, Wee Keong Ng, and Sourav S.
Bhowmick,: A survey of web metrics, ACM computer
survey, Vol 34, No. 4. December 2002, pp. 469-503.
[10] William L. Carloson and Betty Thorne, Applied
Statistical Methods for business, Economics, and the Social
Sciences. Prentice Hall, 1997.
[11] Bubenko J., Brash D. y Stirna J.: EKD User Guide,
Royal Institute of technology (KTH) and Stockholm
University, Stockholm, Sweden, Dept. of Computer and
Systems Sciences, 1998.
[12] E. Insfrán, O.Pastor and R. Wieringa: “Requirements
Engineering-Based conceptual Modelling”, Requirements
Engineering Springer-Verlang, vol. 2, pp. 7:61-72, 2002.
[13] Eduardo Islas P., Eric Zabre B. y Miguel Pérez R.: “Evaluación
de herramientas de software y hardware para el desarrollo de
aplicaciones de realidad virtual”,
http://www.iie.org.mx/boletin022004/tenden2.pdf (2005).

27
Image Segmentation Script Language
Francisco Cervantes Álvarez, Raúl Pinto Elías
Centro Nacional de Investigación y Desarrollo Tecnológico (cenidet)
Interior Internado Palmira s/n, Cuernavaca, Morelos, México.
{cervantes04c, rpinto}@cenidet.edu.mx
Abstract
In this article we propose the use of a script
language to the image segmentation stage in artificial
vision. Here the proposed language, the system
architecture to interpret scripts and the general
structure of the programs that integrate the operator
library are describing. Finally, some tests and results
of the use of proposed script language are shown.
1. Introduction
In this paper we propose the use of a script language
for image segmentation. Nowadays, the use of script
languages in the graphic programming is increasing,
because these allow testing the ideas on an easy way
[1]. Also, script languages easily allow the code reuse
[2]. However, in the artificial vision area few works
focused on digital image processing by script languages
exist, an example is shown in [3], where the user make
a script with graphic objects, then they execute the
script to process a given image. An example of a
commercial script language is MATLAB [4].
The proposed script language, allow proving ideas
of image segmentation on an easy way and the user do
not need to know how the segmentation algorithms
makes the process. Also, the language allows the code
reuse by the operator library (the operators are
independent to each other) and the implementation of a
script interpreter. The library above mentioned allows
that the language can grow of a modular way and
without need to modify the existent code.
This paper is structured of the following way. In the
second section the basic elements of the proposed
script language are described. The third section shows
the basic structure that should have the library
operators. In the fourth section the general scheme of
the script language interpreter is described. Finally, in
the fifth section some tests and results are shown.
Lastly the conclusions are shown and some future
works are commented.
2. Basic elements of script language
The language is composed of the definition of the
following data types: Entero, Real, Cadena,
ImagenGris, Mascara and Contorno. Also the basic
arithmetic operations are defined (addition, subtraction,
multiplication and division). The language have the
following basic structures:
• Declaration.
• Assignment.
• Operator Call.
The corresponding syntax to above structures are
the following.
Declaration:
data_type (variable_name) (, variable_name)*
Assignment:
variable_name = variable_nameX
variable_name = arithmetic_expression
variable_name = operator_name (arguments)
Operator Call:
operator_name (arguments)
The syntax above mentioned provides a general
structure, now all depends on the registered operators
in the operator library. This structure is named
language core. The script language interpreter is very
important because it let recognizes new operators. This
way of language definition, where only the structures
are established but the elements language are not
defined, give the advantage of adding new elements or
commands in a dynamic way, without modifying the
core code. However, each operator is independent to
each other. The single restriction to add an operator to
the library is following a basic structure specification.
This restriction must be followed in order to let the
core and the operators interact.

28
3. Operator basic structure
An operator can be used as a part of the library, if
the operator has a general structure like the structure
shown in figure 1; it can be used as a part of the library.
Core libraries for data type manager
(numbers, strings, images and
templates).
Request of arguments (file paths
where are the content of the
parameters).
Operator body (PDI algorithm).
Return the result (the result is save in
the last argument by a file).
Figure 1 Operator general structure
In the figure 1, the structure that should have the
operators is shown. Some structure elements can be
ignored, for example, in the header only those only
those required core libraries must be included. An
operator can return a value, but this condition is not
absolutely necessary, for example, the operator for
showing an image only displays the image in the screen
and it does not need returning anything to the core. By
default is necessary that the operators receive at least
one argument.
In order to interact the core and operators must be
used the defined data types which are in the core.
Also, the parameters that an operator need for its
execution have to be received by the file path
specification. In these files is the content of the
operator parameters. Finally, also, is necessary to save
the output data in a file which is specified by the last
parameter that is received by one operator. Below, an
example of an operator to extract the negative of an
image is shown.
#include "CImagGris.h"
AnsiString CharToAnsiString(char *arreglo);
int main(int argc, char* argv[])
{
if ( argc < 3 ) exit ( ERR_NUM_PARAM ) ;
char * a_entrada = argv [ 1 ];
char * a_salida = argv [ 2 ];
AnsiString entrada,salida;
CImagGris Imagen;
entrada=CharToAnsiString(a_entrada);
salida=CharToAnsiString(a_salida);
if(Imagen.leerArchivo(entrada)==false)
exit(ERR_IMAG_EN);
int x,y,h,w;
Byte pixel;
h=Imagen.Alto();
w=Imagen.Ancho();
for(x=0;x<w;x++)
{for(y=0;y<h;y++)
{ pixel=Imagen.getPixel(x,y);
Imagen.setPixel(x,y,255-pixel);
}
}
if(Imagen.escribirArchivo(salida)==false)
exit(ERR_NO_MEMO);
exit ( BIEN ) ;
}
Right now the interpreter core can only support
images in BMP format of 24 bits.
4. General scheme of script interpreter
The general structure has been shown. Now, the
general scheme of the script interpreter is presented. In
the figure 2, each component of the interpreter and the
relationship between operators are shown.
Figure 2 Script interpreter general scheme
By this structure the system first explore the
operator library to generate the structure of each
operator (syntax, semantic), later it analyze the input
script and execute the operators.
5. Test and results
Several tests have been done to show the advantages
that provide the script language use on image
segmentation. For example, the user can to use this
language without a direct interaction with the
algorithms.
The first test consists of showing how to make a
new operator that convert an image to binary image. To
Operador
library
Interpreter
Process module
Memory module
Data types
Interpreter core
Script Result

29
create an operator is necessary to have a Builder C++
compiler. In the figure 3 the operator code is shown.
Figure 3 Operator to convert an image to binary image.
Can be saw in the above figure that making a new
operator is very easy, only is necessary to follow the
general structure that has been specified. Now, only the
user has to generate the executable file by the
compilation of the source code. In this moment the new
operator has been created and must be registered in the
library. The second test consists of registering the new
operator. For this the Métodos option of Herramientas
menu is used. In the figure 4 the menu is shown.
Figure 4 Script interpreter interface
When the user click in this menu the registration
screen is displayed, here the new operator have to be
registered, this is shown in the figure 5. Now, the user
has to indicate the executable file path and to assign an
alias. The alias is used by interpreter language, later the
input and output parameters of the operator are
specified.
Figure 5 Screen to operator register
Once the operator is registered; this one can be used
as a part of the language, in the figure 6 show the above
mentioned process. Here, a script to convert images to
binary images is made, and also show the initial and
final image.
With these tests the advantages of the script
language are shown. By the capability of registering
new operators provide an open language to increase
based on the needs of the user. Nowadays the language
has 25 segmentation operators and 5 image description
operators. In the test 3 is shown how the operators
BinarizarImagen and VerImagenGris interact one to
the other. These operators can be saved to be later
reused. Finally the manager of the library is simple
because the interface provides screens to modify,
remove and to add operators to the library.
6. Conclusions
With this paper we can conclude that the use of
script language to image segmentation is practical. In
this work we can saw that the segmentation algorithm
functionality can be absent-minded by using scripts,
then the user do not need to know the algorithms.
Finally, we can say that the script languages might be
used in others stages of the artificial vision.
The operator library used by the interpreter is built
for growing in a modular way. The library grows with
each operator that is registered.
#include "CBMP24.h"
#include "CNumero.h"
int main(int argc, char* argv[])
{
if ( argc < 4 ) exit ( ERR_NUM_PARAM ) ;
char * a_imag = argv [ 1 ];
char * a_umbral = argv [ 2 ];
char * a_salida = argv [ 3 ];
int valorUmbral;
AnsiString imag,umbral,salida;
CIMAGEN_BMP24 Imagen;
CNumeroMemoria UmbralBinario;
imag=CharToAnsiString(a_imag);
umbral=CharToAnsiString(a_umbral);
salida=CharToAnsiString(a_salida);
if(Imagen.leerArchivo(imag,GRIS)==false)
exit(ERR_IMG_EN);
if(UmbralBinario.leerArchivo(umbral)==false)
exit(ERR_UMB_EN);
valorUmbral=(int)UmbralBinario.Valor();
int x,y,h,w;
Byte pixel;
h=Imagen.Alto();
w=Imagen.Ancho();
for(x=0;x<w;x++)
{
for(y=0;y<h;y++)
{
pixel=Imagen.getPixelGrey(x,y);
if(pixel>=(Byte)valorUmbral)
Imagen.setPixelGrey(x,y,255);
else
Imagen.setPixelGrey(x,y,0);
}
}
if(Imagen.escribirArchivo(salida)==false)
exit(ERR_NO_MEMO);
exit ( BIEN ) ;
}
Figure 6 Script to convert an image to binary image

30
7. References
[1] M. Villar, “Guía de lenguajes de script para prototipazo
rápido”, http://www.codepixel.com/tutoriales/prototipado/,
2006.
[2] K. Muehler, “Adaptive script based animations for
medical education and intervention planning”,
Department of Simulation and Graphics, University of
Magdeburg, Germany.
[3] “Sistema interactivo para la enseñanza de la visión
artificial”, Depto. de Sistemas Inteligentes Aplicados,
Escuela Universitaria de Informática, Universidad
Politécnica de Madrid, 2006.
[4] “MathLab”, http://www.mathworks.com/, 2006.

31
Web Page Retrieval Using an Ontology that is Populated by Automatic Text
Classification
Ismael R. Ponce M., José A. Zárate M., Juan C. Olivares R.
{rafaxzero04c, jazarate, jcolivares04c}@cenidet.edu.mx
Abstract
In this article is described a proposal to help users
in the arduous task that means recovering information
from the Web, specially when queries are about a
subject or specific approach. For this, we suggested
the use of an ontology whose instances are Web pages
links about the domain on which the ontology was
constructed, taking advantage of the order and
categorization that it offers, to guide the user through
the concepts that integrates it and find information
related to them. The creation of an ontology about a
particular domain and the necessary activities to get
an automatic classification of the Web pages like
instances in the ontology are described.
Keywords: Ontology, automatic classification
methods, vector space model.
1. Introduction
Nowadays in agreement with the technological
evolution, the amount of information that is generated
every second is incommensurable, and not only that,
also the importance of having it has taken such
importance, so now we live in an era where the
information governs the world and its decisions.
Internet has become a great source of information,
but while greater is it, is more difficult to find the
desired content. Diverse ways have treated to recover
information, for example, the Web search machines,
that uses different techniques to recover it (searchers
like Google, Yahoo, Ask, Vivisimo, and many others),
some ones considering the popularity of the pages, the
use of clustering, etc; nevertheless, although somehow
they help at the time of making queries, users still face
against results not at all wished.
Therefore, diverse ways are treating to help to
search in Internet, that go from the concordance of
words to techniques based on the popularity of the
sites, unfortunately for many users, this type of results
are not enough to them, so that they require more
specific solutions.
The proposed alternative for this problem is to use
the paradigm of the ontologies for Web pages search
on a particular subject. When working on a concrete
domain, a specialized search is expected, in addition,
thanks to the use of ontologies and the order they
provide on the concepts that conform them, suppose a
great help for users to find the information they wish.
In this document we focused in the way to be able
to populate an ontology with Web page links, using
techniques of automatic classification. In our
experiment we were able to report an 86% of well
classified elements.
The article briefly describes the followed steps to
take to the practice the proposed idea. First, is
mentioned a brief panorama of the way to recover
information by some search machines, next includes
the development of a compatible ontology for the use
that are hoped to give it, concluding with the steps for
the use of an automatic classification method that will
be used to populate the ontology, considering the
Naive Bayes, k nearest neighbors and support vector
machines methods.
2. Search Machines
In a traditional search machine the queries are made
generally from key words, obtaining by result a listing
of Web links that are related about the asked words.
Some of the most known search machines are Google,
Yahoo, MSN Search, among others.
The case of Google emphasizes by the use of its
denominated PageRank technology [1], in which is
used a formula that calculates the weight of each Web
page that is stored in its data base, considering the
amount of links that other pages make reference to it.
The greater amount of links to a page, greater is its
score, becoming thus a popularity contest [2].
Unfortunately has been demonstrated that the results

32
can be manipulated by the well-known Google
bombing [3].
Another way to recover information is through Web
directories, which consist of a manual organization and
classification of Web pages, by subjects or categories.
One of the most representative directories is the Open
Directory Project [4], in which a set of voluntary
publishers are the ones who are in charge to list the
Web links inside an ontology, where the links are
grouped by similar subjects in categories. The
disadvantage that can be appreciated is that it requires
too much human intervention to be able to register the
links of the pages.
A special type of search machines are those that
incorporate clustering; the clustering consists of
partitioning a set of similar objects in subgroups,
where the elements that conforms each subgroup, share
common characteristics. This type of search machine
gives back the results that find for a query
accommodated in groups; examples of this are
Clusty.com and Vivisimo.com.
Finally, we found specialized search machines,
which are centered in recovering links of technical and
scientific documents. For example Citeseer, that is a
search machine of documents focused on the
computation, and that uses the bibliographical
references to consider the importance of the documents
that are queried.
Although these and other techniques have been
developed to recover information from the Web, this
area still has much to offer, reason why new
alternatives to help the user are continued looking for,
as is our case.
3. Phase of Ontology Development
Gruber [5] defines an ontology like the explicit
specification of a conceptualization, which means to
identify the concepts that integrate a domain and the
interrelations which exist among them, in a formal
representation, in a way it could be possible to share
and to reuse it.
The standard language established by the W3C to
make this type of formalizations is the OWL (Web
Ontology Language). The use of this standard in
addition to the advantages that allows its reusability by
others, is that to many tools related to the ontology
design and use, are become developed to support it,
like editors, reasoners, etc.
We developed an ontology, considering such points,
along following the proposed methodology by Uschold
and King [5]. The domain on which the ontology was
developed, was the natural language processing (NLP).
In order to develop it, we used the ontology publisher
Protégé 3.1.1, the Protégé-OWL 2.1 plug-in, along
with the OWL Wizards plug-in, in addition to the
FaCT++ 0.99.6 and RacerPro 1.9.0 reasoners, used to
verify the ontology consistency.
The classes are made up of concepts related to the
NLP area, including some ones like investigators,
schools, tools and application areas, mainly.
4. Supervised Learning for Automatic Text
Classification
Once developed the ontology, it continues the phase
to populate it with instances. Given the coarseness of
pages that exist in the Web, a manual classification of
these in the classes established in the ontology would
be an expensive task, and also this is already done in
great measure in the Web directories. Therefore a way
to be able to automate this process was looked for,
recurring to the supervised learning, in which by means
of statistical and mathematical techniques, an
automatic text classification can be done.
This approach is centered in having a document
training set, previously classified, that will be used to
learn to classify new documents. For it, is necessary to
transform the initial state of the documents to a
representation that can be used by a learning algorithm
for the classification.
For test aims we only worked with HTML pages.
Next are mentioned the necessary steps for make this
process. The used training collection consisted of 1624
documents, previously classified in 26 classes taken
from the developed ontology, in addition to a
denominated null class, in which are classified the non
wished documents for the ontology domain. The 26
considered classes are only a representative sample of
the existing classes in the ontology and were taken
only for test aims.
4.1. Document Preprocessing
All the elements (more precisely the words) that
appear in documents are not useful for their
classification, this is, there are words that by
themselves do not say anything about the document’s
content in which they are, and therefore, they can be
eliminated; among this elements are included the
punctuation marks and the HTML labels; also appear
words of very frequent use, words that appear in a
great amount of documents, which causes that their
discriminatory power is very low; these type of words
are known like stopwords, examples of them are the
articles, pronouns, prepositions, conjunctions, among
others.

33
In order to define the stopwords to eliminate, we
recurred to lists available in DOCUM [6], in SMART
[7] and in the Institut interfacultaire d'informatique of
the University of Neuchatel [8], as well as other words
identified during the process of tests.
Because being working on a specific domain,
certain control exists on the terms that belong to it,
reason why was suggested a matching between
different terms that refers about a same concept,
turning them into a single representation, in other
words, if for a concept is possible to be called in
different forms, we considered to unify them and
consider them under a unique form, inside the
classification process. For this, we follow like starting
point the concepts that integrate the ontology.
Finally, many words have the same lexical root; a
basic process of stemming based on Porter’s Algorithm
was followed [9], with which was looked for to reduce
words to their stem.
All the previously steps mentioned above, have like
aim to diminish the size of the training document
collection to make it more manageable, eliminating the
irrelevant parts to continue the automatic classification
process. In our exercise, in average we get to reduce
until a 70% the original size of document collection.
4.2. Vector Space Model
The vector space model (VSM) was proposed by
Salton in 1975 [11]. The basic idea behind this model
is to make a matrix that represents the documents and
the words contained in them, assigning a weight to
each word. Each vector that conforms the matrix
represents a document and the distribution of the words
that appears in it. It is a matrix of m x n, where m
represents documents and n represents the registered
words.
There exist different types of weighting for words in
the VSM; we considered in our tests boolean weighting
(weight of a word is 0 if it not appears in the document,
and 1 if it appears), weighed by frequency of
appearance (the weight of the term depends on the
amount of occurrences the word has in the document)
and finally tf-idf weighed (that is calculated
considering the average of the term frequency against
its inverse document frequency [11]).
4.3. Dimensionality Reduction in the VSM
All the words that integrate the training collection
cannot be considered in the VSM, since the dimension
that it would have would be enormous. Different
techniques to reduce the dimensionality exist, like the
documental frequency, which considers a minimum
value of appearances that must have each word within
the total of documents, to discriminate those words
whose appearance is very small and to leave those that
present a greater documentary frequency.
Another technique that was considered was the
information gain (IG), which calculates the difference
in the entropy of the system against the entropy of each
word. This difference, measured in bits, indicates how
relevant and with how many information contributes a
word in the whole collection, like determining factor to
carry out the classification.
The amount of total words that conforms the
training collection already processed is of 2552196
words, being between these only 125891 different
words. As it is possible to be appreciated, the amount
of different words is too huge to be handled in the
VSM, reason why only were considered those words
that passed a documental frequency with a greater or
equal value to 15, passing this a total of 8966 words,
which represents a 7.12% of the original total words;
nevertheless, it still is a very huge amount, reason why
the IG was applied on these words.
The calculated entropy of the total collection were
of 3.97; the considered words were whose who had an
IG equal or superior to 0.1, being in 527 different
words, a 0.42% of the original size. The words (already
stemmed and standardized) that greater IG presented,
were: nlp (0.552), natural_language (0.479),
knowledge (0.424) and data_min (0.335).
4.4. Automatic Classification Algorithms
Once obtained the VSM representation of the
training document collection, a method for automatic
classification can be applied to classify new elements.
The automatic classification methods we considered
were the Naive Bayes, k-nearest neighbors (kNN) and
support vector machines (SVM), thus to make a series
of tests to find the method that better results gives,
considering in addition the weightings mentioned in
section 4.2. WEKA was used to carry out the tests; the
results shown in Table 1 correspond with the use of the
10 fold cross validation, showing the percentage of
well classified elements.
Tabla 1. Percentage of well classified elements.
booleano tf tf-idf
NaiveBayes 62.7463 55.8498 81.2192
kNN 84.5443 85.0369 84.4212
SVM 86.2685 66.7488 82.0813

34
Figure 1. Classification test results
The best result was obtained with the boolean
weighed using support vector machine algorithm, the
details of this classification are in Table 2.
By using the training that better result offered, is
possible to classify new documents that could be
obtained from the Web, following the steps for the
VSM preprocessing and representation; once obtained
the class to which a document belongs, the link
direction of the page could be saved like instance of
the ontology.
Tabla 2. VSM results with boolean weighed
Well classified elements 1401 86.27%
Wrong classified elements 223 13.73%
Root mean squared error 0.1829
5. Conclusions
From the obtained results, it was decided the
utilization of boolean weighed along with the use of
the support vector machine method; the use of the
boolean weighed is justified because it is possible to
consider that the mere appearance of a word in a
document is a good indicator of its discrimination
value, specially when considering words not so
common in the ordinary speech, but that are relevant
for the domain on which they works, as demonstrates
the fact that great part of the words with greater
information gain belonged to the domain of the NLP,
on which was made this work.
With the obtained results of the words with greater
IG and the obtained exactitude of 86% in the best case
by the selected classification method, is a good
incentive to consider the use of the automatic
classification to populate the ontology.
With respect to the future work, once populated the
ontology, continues the process to guide the users
through the ontology so that they could find links
related to the queries made on the domain of the NLP.
When concluding this work, between the wanted
benefits, are let guide the user through the ontology
classification and its different relations, doing use of
the bounded natural language technique, to verify the
hypothesis that if exists ordered elements, is simpler to
find the wished ones and to save work to users while
showing them only thematic documents related to the
concepts on their queries.
6. References
[1] S. Brin, L. Page, “The Anatomy of a Large-Scale
Hypertextual Web Search Engine”, Computer Science
Department, Stanford University, Stanford, disponible en
línea: http://www.db.stanford.edu/~backrub/google.html,
visited on December 2005.
[2] M. Miller, 501 Web Site Secrets: Unleashed the Power of
Google, Amazon, eBay and More, Whiley Publishing, Inc.,
USA, 2004.
[3] Google bomb, Wikipedia, the free encyclopedia,
disponible en línea: http://en.wikipedia.org/wiki/
Google_bomb, visited on March 2006.
[4] ODP – Open Directory Project, http://dmoz.org/, visited
on April 2006.
[5] T. Gruber, A Translation Approach to Portable Ontology
Specifications, Knowledge Acquisition, 1993.
[6] M. Uschold y M. King, Towards a Methodology for
Building Ontologies, Workshop on Basic Ontological Issues
in Knowledge Sharing, 1995.
[7] DOCUM, a multilingual stopword file for CDS-ISIS,
http://library/wur.nl/isis/docum.html, visited on November
2005.
[8] ftp://ftp.cs.cornell.edu/pub/smart/, visited on November
2005.
[9] J. Savoy, CLEF and multilingual information retrieval,
Institut interfacultaire d'informatique, Universidad de
Neuchatel, Suiza, 2005, http://www.unine.ch/info/clef/,
visited on November 2005.
[10] The English (Porter2) stemming algorithm,
http://snowball.tartarus.org/algorithms/english/stemmer.html,
visited on December 2005.
[11] G. Salton y M. J. McGill, Introduction to modern
information retrieval. McGraw-Hill, 1983, EUA.
0
10
20
30
40
50
60
70
80
90
100
Naïve Bayes kNN SVM
Classification algorithm
boolean tf tf-idf

35
Neuro-Symbolic Hybrid Systems
Vianey Guadalupe Cruz Sánchez, Gerardo Reyes Salgado, Osslan Osiris Vergara Villegas
Centro Nacional de Investigación y Desarrollo tecnológico (cenidet)
Interior Internado Palmira S/N, Col. Palmira. C.P. 62490.
Cuernavaca Morelos México.
{vianey,osslan}@cenidet.edu.mx
Resumen
Actually, the Hybrid Systems (HS) approach is very
used to solve problems where different knowledge
representations are involve in one system. This
integration has the advantage to compensate the
weakness of one or another system complementing
their strengths. The Neuro-Symbolic Hybrid Systems
(NSHS) arise of the HS as an approach that offer the
possibility to implement robust systems where the
connectionist and symbolic nature are present in their
environment. The reason of the NSHS study is to
implement them in the artificial vision system process,
such that we can propose different strategies of
solution among different representations type involve
in this process, for that the Hybrid Systems
development cycle and the NSHS classification criteria
have a very important role in the definition of these
strategy.
1. Introduction
In last decade, it was very common working with
only one knowledge representation type. Even, there
had been one competition to demonstrate the
representation used was better than other that solved
the same problem. However, with the past of time the
researchers observed the weakness of each
representation and the complementary properties that
existed among those, the scientific community decided
to prove their capacities integrate them in one system
(Hybrid), in this integration better results were obtained
than the results obtained using an individual way.
The hybrid approach is inspired in the natural
mechanism, where: according to [1], we are
processing’s machine of hybrid information, our
actions are government by means of the combination of
the genetic information and the information acquired
by means of the learning. Due to this combination we
have the possibility of use different processing’s
method in complex and changing environments
successfully.
Under this natural scheme, the hybrid systems have
arisen as a new way to give solution to complex
problems where are necessary several knowledge
representations to use the information coming of the
environment, this environment determine the strategy’s
that should be used to increment the knowledge and
develop systems more robust [2].
However, the design and development of these
systems is difficult due to the big number of pieces or
components that are involve and the different
interactions among them [3].
The tendency is the study and construction of
hybrid systems whose strategy of solution may be the
best to solve the problem. In this paper we present a
study of the process that involve the desing and
development of a hybrid system as well as the criteria
for the clasification of one particular type of hybrid
system (HS) the Neuro-Symbolic Hybrid System
(NSHS), this has been used in applications such as: the
object recognition.
2. Design and development of a hybrid
system
In [1] propose Hybrid Systems development cycle,
in which present the process for the construction of this
system. The importance of this model are the bases on
which one NSHS may support its design.
2.1 Hybrid System development cycle
A structured approach like [1] can reduce the
development time and cost of a HS. The stages for the
construction of inteligent hybrid systems are: problem
analysis, matching, selection of the hybrid category,

36
implementation, validation and maintenance (see figure
1).
Figure 1. Hybrid system development cycle.
a) Problem analysis.
This stage involves the following steps:
1. Identify any sub-task existent in the problem.
Identificar cualquier sub-tarea existente en el problema.
2. Identify the properties of the problem. If the
problem has sub-task, this involve identify properties
of them.
b) Matching property.
Involve the matching among the properties of the
available techniques with the requirements of the
identified task.
c) Hybrid category selection.
In this phase we select the hybrid system type
required to solve the problem. This phase use the
results of the previous stages of problems analysis and
the matching property.
d) Implementation.
In this stage the developer will be in the position of
select the programming’s tool and the environment
necesary to implement the hybrid system.
e) Validation.
This phase is used to prove and verify the
performance of the individual components of the
application and the whole hybrid system.
f) Maintenance.
The performance of the hybrid system should be
periodically valued and refine it as it is necesary. The
maintenance is very important for adaptatives systems,
(i. e, neural networks).
3. Neuro Symbolic Hybrid Systems
The NSHS are systems formed by two or more types
of knowledge representations, one in connectionist way
and other in symbolic way. Both representations have
one group of qualities, these integrated in one system is
extremely powerful to solve complex problems.
Artificial neural networks are a type of connectionist
knowledge representation inspired in the functionality
of the biological neuron. This representation type has
been used for its learning capability and generalization
of knowledge, it being one very powerful tool to solve
complex problems of pattern recognition.
For other hand, one symbolic representation in
format of logical rules is based in the capability that the
human has to express the knowledge in natural way.
The last thing is very powerful to insert the knowledge
of a human expert to one system, as well as explain the
problem. This representation type has been used widely
in areas such as: pattern recognition, natural language
processing, robotics, computer vision and expert
systems.
Both types of knowledge representations are
combined in one system to suppress the disadvantages
of one or another representation and take advantage of
their integration.
The integration of the NSHS and its future
application in the design of an artificial vision process,
is very important know the criteria used for the NSHS’s
classification, due to this classification is obtained a
wide view of the different behaviours that may has one
NSHS.

37
4. NSHS Classification
In order to classify the NSHS are considered many
criteria [4]. Next, we explain shortly each one.
Tabla 1. Criteria to classify NSHS.
a) Integration type
Neuro-symbolic integration can be classified in
mainly three groups, according to the “hybridation
type” of the approach used.
• Unified approach. Attempt integrate the symbolic
systems properties into connectionist systems and vice
versa.
•Semi hybrid approach. This approach is used to
achieve translations. For example, the compilation of a
rules base in a net (knowledge insertion) and the
explicitation of rules starting from a net (knowledge
extraction).
•Hybrid approach. In this approach type may exist
many symbolic and connectionist modules integrated to
each other.
b) Couple grade
Define the interaction force between two modules.
The classification of different grades is carry out
through a progressive level that go since a extreme to
another. This classification consists of three levels:
• Weak couple. In this architecture type, the
different modules are connected by a simple relation of
input/output, and the communications are
unidirectional.
• Medium couple. In this category, the interactions
among modules are more flexibles, due those are
bidirectional; It doesn’t treat simply of input/output
relationship but rather each one the modules can
influence on the operation of the another.
• Strong couple. In these systems the knowledge
and data are not only transferred, also are shared
among modules through internal structures in common.
c) Integration mode
Represent the reason why the neural module and
symbolic module are configured in relation of one to
other and the full system.
•Chain. Two modules operate in sequence. One is
the main processor and it is assisted by another module,
acting like pre or post processor. The relationship
among modules is input/output.
•Sub-treatment. In this integration mode, one
module is subordinate of another to achieve some
function. The main module decides in what moment
call it and how use its output.
•Meta-treatment. One module solves the problem
and the other play a meta-level role such as carry the
control or improving results.
•Co-treatment. Both modules are the same in the
problem solution process. For example: one module
solves one specific problem and other module solves
the rest of the problem.
d) Knowledges tranference
The knowledge’s transference may be classified
according to the direction of the interchange.
•From symbolic to connectionist. The symbolic
knowledge is transferred since one symbolic module
and it is integrated to one connectionist module(S
→C).
• From connectionist to symbolic. The knowledge
acquired by learning in connectionist net may be
explained in symbolic rules way (S→C).
•Bilateral transfer. The knowledge can be
transferred in both senses: symbolic and connectionist
(S↔C). Usually include compilation mechanism and
rules extraction starting from the nets.

Lookingforwardenglish

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Lookingforwardenglish

Semelhante a Lookingforwardenglish (20)

Mais de Juan Carlos Olivares Rojas

Mais de Juan Carlos Olivares Rojas (20)

Lookingforwardenglish