SlideShare uma empresa Scribd logo
1 de 70
Baixar para ler offline
cenidet
V O L . 1 3S U M M E R 2 0 0 6
Editorial
Dear Colleagues,
The Computer Society with approximately
100,000 members, is the leader organization for
the professionals in computer sciences. It was
founded in 1946, being the biggest of the societies
organized by the IEEE.
The Computer Society foments
international communication, cooperation and
exchange of information. It is dedicated to the
promotion of the computer theory, practice and
application to the data processing technology.
"Looking Forward" is the student written
and for The Computer Society students magazine.
After a great effort, we finally present the number
13 edition of the "Looking Forward" electronic
magazine.
Before continuing we want to appreciate
the help of the members of the Computer Student
Chapter of the CENIDET, who are very honored
of being elects to produce this edition.
The articles contained in this magazine
embrace diverse topics, all related with the
Master degree and Doctorate degree thesis
projects being developed at the Computer
Sciences Department. Particularly, in the
following lines of investigation: Software
Engineering, Distributed Systems and Artificial
Intelligence.
We appreciate the cooperation of M.S.
Andrea Magadán Salazar for coordinating all their
members.
This edition has been possible thanks to
the help of Prof. Ken Christensen, we
acknowledge his support and trust to the
Computer Student Chapter of the CENIDET.
We know that participating in this kind
of projects exalts the prestige of our institution
and our country.
We hope in a future this magazine can
be added to the IEEExplore as part of the
literature of the Computer Society.
Long life Looking Forward!!!
Jonathan Villanueva Tavira
Jonathan.villanueva@ieee.org
Director
Ken Christensen,
Associate Profesor,
Department of Computer Science
and Engineering ,
University of South Florida
Editor in Chief
M.C Andrea Magadán Salazar
Vicepresidenta del Capítulo de
Computación Sección Morelos.
Editors
Jonathan Villanueva Tavira
Rocio Vargas Arroyo
Jorge A. Saldaña García
Collaborators
Edgar Colorado Soto
Erika M. Nieto Ariza
Francisco Cervantes Álvarez
Hilda Solano Lira
Jorge A. Saldaña García
Juan C. Olivares Rojas
Luis E. Morán López
Maricela C. Bravo Contreras
Michelle Arandine Barrón Vivanco
Osslan O. Vergara Villegas
Rafael I. Ponce Medellín
Rocío Vargas Arroyo
Salvador Cervantes Álvarez
Vianey G. Cruz Sánchez
Student Branch Chapter CENIDET
The Center offers the possibility to
carry out investigation works in agreement
with other institutions like the Institute of
Electric Investigations and the Center of
Investigation in Energy; besides, there are
agreements with important universities and
foreign institutes.
CENIDET has the departments of
Electronic, Mechanics, Mechatronic and
Computer Sciences, headquarters of the
Computer Student Chapter of the IEEE.
The National Center of Research and
Technological Development, CENIDET, is
located in Cuernavaca, Morelos, city that has
been transformed into headquarters of several
scientific institutions, each dedicated to
research and education, allowing a profitable
exchange among them.
Cuernavaca benefits from its
proximity to the Mexico city, since the
researching professors and the students can
easily move to participate or to attend events,
to exchange information, to consult libraries,
to receive consultantships and,in general, to
be related with all the activities that
propitiate and enlarge the knowledge and
the creativity.
The Center, naturally, participates of
this valuable synergy. The CENIDET offers its
postgraduate programs, for related graduate
people that are interested in getting ready for
the applied investigation and the technological
development.
Welcome to CENIDET
Computer Sciences Department
Web Page Retrieval Using an Ontology
that is Populated by Automatic Text
Classification
21.
Summary
1. Editorial.
2.Welcome to CENIDET.
Web Page Classification:
a Semantic Análisis.7.
Digital Image Processing in
Wavelet Domain.13.
Evaluation of tools for business
process in three levels of
abstraction.
27.
Image Segmentation
Script Language.
GAP: A Tool to Solve the Problem of the
Web Contents Visualization in Pocket PC
Devices.
17. 31.
Summary
Methodology for the generation
of 3D models.41.
Neuro-Symbolic Hybrid Systems.
35.
Vanishing points detection using
Thales's theorem.45.
Segmentation by color to
characterize human gait.55.
An Ontology-based Translator
for Communicating Negotiation
Agents over Internet.
49.
59.
Description of two Statistical Models Applied
To The Extraction Of Facial Features: Integral
projections And Active Shape Model.
63. Authors.
7
Web Page Classification: a Semantic Analysis
Rocío Vargas Arroyo, Azucena Montes Rendón
Centro Nacional de Investigación y Desarrollo Tecnológico
{rvargas04c,amr}@cenidet.edu.mx
Abstract
In this paper, a semantic analysis for Web page
classification is presented. A set of Web pages,
resulting from a simple query to a Web browser, is
categorized by disambiguating the meaning of the term
used for the search. The disambiguation process
begins with the isolation of some outstanding
paragraphs; linguistic markers are used to accomplish
this task. The search term is located within the
paragraphs and the Contextual Exploration Method is
used to identify words that lead to the discovery of
relationships within an Ontology. Finally, the
discovered relationships are used for assigning the
web page to a category.
1. Introduction
Natural Language Processing is a challenging task
of Artificial Intelligence, because dealing with
language is not a simple task. With the immeasurable
growing of the Web, the problem of getting the proper
and desired information has grown too. Several
research groups have got significant and slightly
sufficient results, but not good enough to solve the
general needs. These obtained results are sensitive to
the ambiguity problem caused, mainly, by the used
techniques thus most of the developed projects [1-9]
are based on statistical methods and ignore linguistic
techniques.
In this work our main intention is the creation of a
classification tool. This tool will be able to classify a
set of HTML web pages written in Spanish. Linguistic
markers, Ontology and Contextual Exploration Method
[10] are integrated to accomplish the task.
2. Linguistic markers
In order to emphasize certain ideas contained in a
text, the author uses discourse markers. These markers
are linguistic units that set the order of the discourse.
Martín Zorraquino and Portolés [11] define these
markers as:
“Unidades lingüísticas invariables que no ejercen
una función sintáctica en el marco de la predicación
oracional – son pues, elementos marginales – y poseen
un cometido coincidente en el discurso: el de guiar, de
acuerdo con sus propiedades morfosintácticas,
semánticas y pragmáticas, las inferencias que se
realizan en la comunicación” [11].
Prada [12] extracted, from Martín Zorraquino and
Portolés [11] extensive analysis of these markers, five
categories (see table1).
Table1. Discourse markers
Category Sub-category
Comentadores
Ordenadores
Marcadores
estructuradores de la
información Disgresores
Aditivos
Consecutivos
Marcadores
conectivos
Contraargumentativos
Explicativos
De rectificación
De distanciamiento
Reformuladores
Recapitulativos
De refuerzo argumentativoOperadores
argumentativos De concreción
De modalidad epistémica
De modalidad deóntica
Enfocadotes de la
alteridad
Marcadores
conversacionales
Metadiscursivos
Each type of marker has especial use in Spanish
discourse, but for this project, the attention is focused
on recapitulative markers because they introduce a
recapitulation or conclusion of what was said, they
concentrate a general view of the text intention and let
the reader, reach the final thought of the author.
8
Examples of these recapitulative markers are: En
suma, en conclusión, en definitiva, en fin, al fin y al
cabo, resumiendo, finalmente among others.
3. Contextual Exploration Method
The Contextual Exploration Method (CEM) [10]
was developed by LaLIC team of the Paris-Sorbonne
University and directed by Jean Pierre Desclés. This
method provides a frame for the identification of
semantic information in a text and a set of mechanisms
that help in the resolution of semantic indeterminations
or ambiguity cases. It also assumes that the texts
include linguistic units that help in the task of sense
construction.
Is required, by the method, the description of
indices and indicators. Indicators are linguistic units
associated to a set of contextual exploration rules.
When an indicator is located within the text a rule is
triggered; the context of the indicator is explored in
order to identify indices or words that lead to the real
meaning of the indicators. Indicators, indices and rules
integrate the linguistic knowledge base.
4. Ontology interaction
For this project, a Spanish version of EuroWordNet
[13] is used. EuroWordNet is a lexical-semantical
ontology of many languages such as Spanish, Dutch,
Italian, Frech, German and Czech. It follows the
WordNet model but integrates some improvements as
better expressivity of the knowledge base; adds new
relationship definitions and transcategorial
relationships.
The version used in this project was transformed
into a database and accessed through SQL queries.
The result of the queries leads to the construction of
context exploration rules.
Figure 1. Graphical result of a query to the Ontology.
Search term capa.
5. Semantic analyzer
The process begins with a query to a well known
and widely used web browser. Simple search terms are
used (simple terms are more sensitive to ambiguity
problem) and a set of hyperlinks is retrieved. This set
is filtered to get only HTML pages written in Spanish.
Once the hyperlinks are extracted, each page is loaded
and its content is analyzed to get the most important
paragraphs; this paragraph selection is made by
locating linguistic markers within the text. The search
term is located within the isolated paragraphs the left
and right context are evaluated in order to get
surrounding indices that lead to the discovery of
relationships within an ontology. The extracted
relationships allow the association of the HTML web
page to a category.
The full process is graphically represented in figure
2.
Figure 2. HTML web page classification scheme.
5.1. Web browser query and hyperlink
extraction
The classification process begins with a query to a
web browser. The search term is a simple term, a
single word. The web browser returns a set of
hyperlinks linked to web pages where the search term
is contained.
Hyperlinks are extracted and its associated page is
analyzed later.
Several query strings where analyzed in order to
simplify the query to the web browser. Language and
file format is specified. Example:
9
http://www.google.com.mx/search?num=<number of
resulting links>&as_epq=<search term>&lr=
<language> &as_filetype=html
Two sets of hyperlinks are extracted, the first set
corresponds to the direct link to the web page, the
second, corresponds to the cache version of the page.
See figure 3.
Figure 3. Simple interface for hyperlink extraction.
5.2. Localization of linguistic markers and
paragraph selection
Each web page is loaded and its body content and
some other representative information as metadata are
extracted. See figure 4.
Figure 4. Simple interface for web page content
extraction.
Some linguistic markers, mostly recapitulative
markers, are located within the body of the page.
Paragraphs where these markers are present are
selected for a more extensive analysis. Consider the
next text segment and the search term “capa”:
“…El segundo sistema permite una elección mas
amplia del copulador y del revelador, pero implica
mayores dificultades en el tratamiento. El revelador
que tiene el copulador debe actuar en la capa
correcta, y esto se logra controlando la velocidad de
difusión o aplicando exposiciones controladas. El
método de la difusión controlada es anticuado. El
método de exposición selectiva utiliza una película de
varias capas que tiene la siguiente construcción: sobre
la base se aplica la emulsión sensible al rojo y encima
una emulsión sensible al verde; sobre esta, una capa
de gelatina que contiene un colorante filtro amarillo, y
finalmente sobre la capa filtro se aplica una capa de
emulsión sensible al azul…” [14]
From this text segment, a representative paragraph is
extracted:
“El método de exposición selectiva utiliza una película
de varias capas que tiene la siguiente construcción:
sobre la base se aplica la emulsión sensible al rojo y
encima una emulsión sensible al verde; sobre esta, una
capa de gelatina que contiene un colorante filtro
amarillo, y finalmente sobre la capa filtro se aplica
una capa de emulsión sensible al azul”. [14]
5.3. Search term localization and context
exploration
Once the paragraphs are selected, the search term is
located within each paragraph an its left and right
context are explored looking for key terms that lead to
the discovery of a relationship within an ontology.
The Contextual Exploration Method [10] is applied
and indicators, indices and rules are defined.
The indicators sets are composed by the search term
and its synonyms, the set of indices is populated by a
query to an Ontology and the rules are dynamically
created. Example:
Indicators = {capa, mano, baño}
Indices = {pintura, emulsión, impermeabilizante,
esmalte, barniz, …}
Rule:
If In the context of the I1 set appears any
indice belonging to set I2
Then
Assing the “Cubrimiento aplicado a una
superficie” meaning to the search term in the
selected statement
End if
“El método de exposición selectiva utiliza una
película de varias capas que tiene la siguiente
construcción: sobre la base se aplica la emulsión
sensible al rojo y encima una emulsión sensible al
verde; sobre esta, una capa de gelatina que contiene
un colorante filtro amarillo, y finalmente sobre la
capa filtro se aplica una capa de emulsión sensible al
azul” .
10
5.4. Relationships extraction and final
categorization
For the final categorization, a set of categories must
be defined. A first approach for solving this task is to
extract a group or relationships, from the ontology,
where the search term and the given sense are involved.
Example:
Figure 5. Homonym relationships of the search term
capa.
All nodes presented in figure 5 are homonyms of the
term capa, each node has a different meaning. Nodes 0
means “Acción y efecto de cubrir la superficie de un
cuerpo con un material” and node 3 means
“Cubrimiento aplicado a una superficie”. These
relationships where discovered in the Ontology, but
these are not sufficient for creating a category, so
second level relationships must be discovered.
Figure 6. Second level relationships of the search term
capa.
In figure 6, the hyponym relationships (for nodes 0
and 3) of the term capa are shown. Here is where
indice terms appear and let us create the categories for
the classification.
Therefore, a first set of categories names might be
formed by first level relationships and its meaning.
Here is where a big problem arises, the ontology data is
not complete, some of its meanings are missing.
6. Conclusions
The easiest way to get information from the web is
trough web browsers and directories; however, the
obtained results not always are sufficient enough
because the search techniques do not care about the
semantic content of the pages. So is necessary the
implementation of a tool able to make a proper
classification of the web pages according the real
meaning of the query.
With a tool like this, the search process made by the
user will be improved cause documents out of the
interest might be discriminated and as a consequence,
the number of pages to visit and the time inverted in
exploring not relevant ones will be diminished.
Moreover, the introduction of linguistic techniques
to classification methods might improve the way web
spiders indexes the information.
7. References
[1] A. Gulli and P. Ferragina, “The anatomy of a hierarchical
clustering engine for web-page, news and book snippets”,
Fourth IEEE International Conference on Data Mining,
ICDM’04, Brighton, UK, 2004.
[2] A. Gulli, “SnakeT”, http:// www.snaket.com, Pisa
University, Italy.
[3] Vivísimo, Inc., “How the Vivísimo clustering engine
works, http://www.vivisimo.com , 2003.
[4] Vivísimo, Inc.,”Clusty”, http://www.clusty.com , 2005.
[5] A. Téllez-Valero, M. Montes-y-Gómez and L. Villaseñor-
Pineda, “Aplicando la clasificación de texto en la extracción
de información”, Encuentro Internacional de Ciencias
de la Computación, ENC, Colima, México, September 2004.
[6] J. M. Gómez, E. Puertas, G. Carrero, M. de Buenaga,
“Categorización de texto sensible al coste para filtrado en
Internet”, Procesamiento del Lenguaje Natural, SEPLN,
Magazine nº 31, September 2003.
[7] L. Golub, Automated subject classification of textual Web
pages, for browsing, Thesis for the degree of Licenciate in
Philosophy, Lund University, Switzerland, August 2005.
[8] F. Sebastiani, “Machine learning in automated text
categorization”, ACM computing surveys, 2002.
[9] F. Sebastiani, “Text categorization”, in Zanasi A., Text
Mining and its Applications, WIT Press, Southampton, UK,
2005.
11
[10] J. P. Desclés, E. Cartier, A. Jackiewicz and J. L. Minel,
“Textual Processing and Contextual Exploration Method”,
Context 97, Rio de Janeiro, February 1997.
[11] Martín Zorraquino, Mª Antonia y José Portolés Lázaro.
Los marcadores del discurso. Bosque y Demonte. Vol. 3.
4051-4213.
[12] J. Prada, G. Moncecchi, “Reconocimiento eficiente de
marcadores del discurso en español”, VIII Simposio
Internacional de Comunicación Social, Santiago de Cuba,
Cuba, January 2003.
[13] Amsterdam University, “EuroWordNet: Building a
multilingual database with wordnets for several European
languages”, March 2006,
http://www.illc.uva.nl/EuroWordNet/
[14] “Fotografía / Fotografía en colores”, March 2006,
http://www.punksunidos.com.ar/punksunidas/fotografia/foto
grafia6.html
13
Digital Image Processing in Wavelet Domain
Osslan Osiris Vergara Villegas and Raúl Pinto Elías
Centro Nacional de Investigación y DesarrolloTecnológico (cenidet)
Interior Internado Palmira S/N, Col. Palmira. C.P. 62490.
Cuernavaca Morelos México.
{osslan, rpinto}@cenidet.edu.mx
Abstract
In this paper we present some explanation about
digital image processing in the wavelet domain. First
an image is transformed using a Discrete Wavelet
Transform (DWT), then several mathematical
operations are applied in order to observe some
features presented in the image. The transformation
can reveal some features that are not clear or difficult
to detect in the original domain. We obtain wavelet
directionality and edge detection, image smoothing,
scale changing, image denoising and compression, and
finally, feature extraction in wavelet domain.
1. Introduction
Wavelet transform was used in geophysics in early
1980s for the analysis of seismic signals [1]. A wavelet
transform decomposes a signal f into its components on
different scales or frequency bands. This is made by a
convolution process on f using the translated and
dilated wavelet ψ (wavelet mother). Based on the
selection of ψ, the transformed function allows, for
example, the extraction of the discontinuities or edges
in f, performing a pattern recognition task or storing a
compressed version of f.
Wavelets are signals which are local in time and
generally have an irregular shape. A signal can be
decomposed into many shifted and scaled
representations of the original mother wavelet [2].
Wavelets have the advantage of being able to
separate the fine details in a signal, very small wavelets
can be used to isolate very fine details in a signal, while
very large wavelets can identify coarse details. In
addition, there are many different wavelets to choose
from. One particular wavelet may generate a more
sparse representation of a signal than another, so
different kinds of wavelets must be examined to see
which is most suited for the application you need in
digital image processing for example image
compression or denoising, pattern recognition, etc.
2. Multiresolution Analysis
Multiresoltion analysis is concerned with the study
of signals or processes represented at different
resolutions and developing an efficient mechanism to
change from to one resolution to another [3]. The
discrete Wavelet Transform (DWT) is a mathematical
tool for the analysis and synthesis of signals that can be
used when digital images need to be viewed or
processed at multiple resolutions.
The localization of signal characteristics in spatial
(or time) and frequency domains can be accomplished
very efficiently using wavelets. This allows us to
simultaneously determine sharp transitions in the
spectrum of the signal and in the position (or time) of
their occurrence.
The principle behind the use of wavelets for
decomposing an image is to use a wavelet function Ψ to
represent the higher frequencies corresponding to the
detailed parts of the image, and a scaling function Φ to
represent the lower frequencies corresponding to
smooth parts of the image. Figure 1 shows the process
to decompose an image using filters to obtain the
subband decomposition.
Figure 1. Subband decomposition of an image.
14
Multiresolution analysis plays an important role in
the perception and decision mechanism of human
beings.
3. Wavelet Directionality and Edge
Detection
With the wavelet transform we can obtain some
information (details) about the directionality and the
edges presented in an image. In this section we use the
image shown in figure 2 to explain how to obtain
directionality and edge detection.
Figure 2. Image “Star” for edge and
directionality analysis.
The first thing to do is to transform the original
image using some DWT, for this example we
decompose the image only one level using the symlet 4.
Figure 3 shows the resulting image from wavelet
transform.
Figure 3. “Star” after wavelet decomposition.
From figure 3 we can see that the horizontal edge of
the original image is presented in the horizontal detail
coefficient matrix of the upper-right quadrant.
Similarly the vertical edge is shown in the vertical
detail coefficients of the lower-left quadrant. Finally
you can see that the diagonal borders are shown in the
diagonal detail coefficients of the lower-right quadrant.
From the wavelet subband image we can combine
the edge information into a single image. The only
thing that we need to do is changing to zero all the
approximation coefficients (upper-left quadrant), and
then compute de Inverse Discrete Wavelet Transform
(IDWT). The resulting edge image is shown in figure
4.
Figure 4. “Star” edge resulting image.
We can use a similar procedure to isolate the
vertical or horizontal edges. It is important to remark
that the information of diagonal edges is always
preserved although we cut the diagonal coefficients.
4. Image Smoothing or Blurring
Wavelets can be used as a tool for blurring or
smoothing images. For this example we use the image
shown in figure 5.
Figure 5. “Tools” image.
In order to smoothing, we need to compute the
wavelet transform in more than one decomposition
level; for example we use the Symlet 4 with four
decomposition levels as shown in figure 6.
Figure 6. Four decomposition level of “Tools”.
Converting to zero one detail level allows us to
obtain increasingly smoothed versions of the original
image.
15
For example if you zeroed the first three details
level, we can obtain an image as the one shown in
figure 7a). If we change to zero all levels (four) we can
obtain an increase of blurring in the image as shown in
figure 7b.
Figure 7. “Tools” images. a) Result of zeroing
first three levels, b) result of zeroing all levels.
5. Change the Scale
Changing the scale of an image implies a
mathematical operation called interpolation. With the
DWT we can downscale an image with a factor of 2n
or
upscale an image with a factor of 2n
. This process is
made by removing or adding subbands in the wavelet
domain.
This scale changing provides an application to
progressive reconstruction of the image. Lets suppose
that we need to send an image to two users in different
sites and those users need different resolution images.
The image can be reconstructed with gradually
higher resolution approximations of the final high
resolution image, and we can send the adequate image
for each user at an exact subband reconstruction.
Figure 8 shows an upscaling example of the camman
image.
Figure 8. Upscaled Camman image. a) Original
image, b) Incise a doubled image, c) Incise b
doubled image.
Performing of the upscaling or downscaling process
is better than the same process made by interpolation.
6. Image Denoising
Image denoising is the process of separating the
noise out of the image components from a single
observation of a degraded image. The image can be
corrupted with noise because of either data acquisition
process or naturally occurring phenomena.
The simplest technique for denoising is wavelet
thresholding (shrinkage). We use as input a noise signal
like the image that we shown in figure 9a. We perform
the wavelet transform using for example four
multiresolution levels, and then we use a denoising
method called soft thresholding through all subbands.
The technique sets coefficients with values under
the threshold (T) to 0, then substracts T from the non-
zero coefficients. After soft thresholding, we compute
the inverse wavelet transform. Figure 9b shows the
image obtained from the denoising process.
Figure 9. “Goldhill” image. a) Gaussian noise
image, b) denoised image
The main problem in denoising is the selection of
the best value of T for thresholding.
7. Image Compression
One of the main popular applications of wavelets is
image compression. Data compression goal is to reduce
the volume of necessary data to represent a certain
information amount. One of the advantages obtained
with the use of DWT is that the image is decorrelated,
thus the image can be easily compressed.
Some of the wavelet coefficients obtained from
DWT correspond to details in the data set. If there are
few details, they might be omitted without substantially
affecting the main features of the data set.
The first step is to transform the original image into
the wavelet domain using the DWT, one of the
important decisions is what family of wavelet to use
and what multiresolution level to apply. The selection
of a wavelet family depends a lot on the subsequent use
of the image, but it is necessary to take into account
some wavelet properties as orthogonality, number of
16
vanishing moments, compact support, symmetry, etc.
By the other hand, the multiresoltion level can not be
larger than Log2(N).
The quantization stage is made after the process of
DWT; here, we can use two different strategies. One is
to set all high frequency sub-band coefficients that are
under a particular threshold to zero. The other is to
change to zero, for example, the coefficients behind the
matrix diagonal or some decomposition level.
There are two highly used techniques for
quantization: the Embedded zerotree wavelet coder
(EZW) and the Set Partitioning in Hierachical trees
(SPIHT) which are very efficient for several
applications.
The final stage corresponds to the entropy coder
which is a lossless stage. Figure 10 shows the lena
image and from left to right different images obtained
from compression process at different quality and
storage space.
Figure 10. “Lena” image at different quality
and storage space.
8. Image Feature Extraction
Image classification is maybe the most important
application when using digital images. In order to
perform it, a feature vector is used to describe an
image.
The statistical properties of the wavelet coefficient
characterize an image, which can be used to lead us to
the better image classification. Some measures
obtained from wavelet coefficients are:
Norm-2 energy:
∑=
=
N
k
kC
N
E
1
2
21
1
(1)
Norm-1 energy:
∑=
=
N
k
kC
N
E
1
12
1
(2)
Standard deviation:
∑=
−=
N
k
kC
N
E
1
2
23 )(
1
µ (3)
Average residual:
∑=
−=
N
k
kCE
1
2
4 )( µ (4)
Entropy:
∑=
−=
N
k
kk CC
N
E
1
22
25 log
1
(5)
Where
∑=
=
N
k
kC
N 1
2
1
µ (6)
µ is the mean and N the size of the image.
9. Conclusions
In this paper we show some applications of the
wavelet transform for digital image processing, with
the goal of demonstrating that an image can be
manipulated even in the wavelet domain.
Research in wavelets keeps looking for some more
complex families fitting with a particular application,
for example, trying to describe an important feature of
the image known as image geometry.
10. References
[1] Morlet, J., G. Arens, E. Fourgeau, and D. Giard, “Wave
propogation and sampling theory part 1: Complex signal and
scattering in multilayered media”, Geophysics, Vol. 47, No.
2, pp. 203 -221, February 1982.
[2] Maryhelen S., “Image compression using wavelets”,
Thesis proposal, Department of electrical and computer
engineering, University of New Brunswick, Canada, 1997.
[3] Mallat, S., “A theory for multiresolution signal
decomposition: The wavelet representation”, IEEE
Transactions on Pattern Analysis and Machine Intelligence
(PAMI), Vol. 11, No. 7, pp. 674 – 693, July 1989.
[4] Gonzalez Rafael C., Woods Richard E. and Eddins
Steven L., Digital image processing using Matlab,
Pearson Prentice Hall, 2004.
17
GAP: A Tool to Solve the Problem of the Web Contents Visualization in
Pocket PC Devices.
J. Carlos Olivares R., J. Gabriel González S., Azucena Montes R., Víctor J. Sosa S. e I. Rafael
Ponce M.
Centro Nacional de Investigación y Desarrollo Tecnológico(cenidet)
Cuernavaca, Morelos, México
{jcolivares04c, gabriel, amr, vjsosa, rafaxzero4c}@cenidet.edu.mx
Abstract
This tool intends to fill the existing ‘GAP’ in the
Web sites visualization in mobile devices, such as
Pocket PC. In order to guarantee that the users can
correctly visualize the Web resources, two things are
needed: a mechanism for controlling disconnections,
and allowing visualization of Web content despite of
the device connection state (hoarding), and a
mechanism that can adapt the Web content to the
specific mobile device features (transcoding). GAP is a
tool that integrates these two mechanisms and allows
improving of the user’s navigation experience in the
Mobile Web.
Keywords: Pocket PC, Visualization, Web Resources,
Hoarding, Transcoding.
1. Introduction
Mobile devices are each time closer in time,
according with [1]: "By 2009, more than a half of the
microprocessors made in the world will be intended for
mobile devices." "The software that will really make
mobile devices useful isn’t developed yet." These
statistics reflect that the use of mobile devices is
increasing due to their tiny size and that its power of
processing and versatility is growing day by day.
The problem of Web resources visualization in
mobile devices is the fact that the great majority of
Web sites in Internet have not been designed for this
type of devices. The mobile devices have limited
resources like small screens, little memory, low
processing speeds, etc; in comparison with traditional
computers equipment.
On other hand, the Web and the protocol that
manages it: HTTP are connection oriented (they are
based on TCP) what causes the transaction to fail if
the user, by any reason, becomes disconnected from
the network . In this case, it might not be possible to
visualize the Web resources in the mobile client.
Disconnections are frequent in this type of devices,
mainly because of their main advantage: mobility.
In this work a system which development is in
progress is described. It focuses in attacking the
problem of Web resources visualization on mobile
devices. The main characteristic of this work is that
great part of the system is executed in this kind of
devices, in comparison to the great majority of the
existing solutions that are executed in traditional
platforms.
2. Alternatives of solution
In order to solve this problem several alternatives
are presented: to design a new protocol, to modify and
existed protocol or to implement intermediary services
that solve the problem.
2.1 New protocols
In this scheme is possible to mention the WAP
protocol and the WML language, they work in an
analogous way as HTTP-HTML in the traditional Web.
The problem strives in that WAP only works with
mobile equipment and this would bring the same
fragmentation that today has the Web (special pages
for all class of devices). In addition, WAP was
originally designed for devices with limited resources
capacities (monochrome screens, lower bandwidth, etc)
which is actually solving day by bay through
bandwidth wireless connection (WCDMA, UTMS,
802.11g, WiMax, etc) and with more and more
powerful equipment.
The best solution would be to create a new protocol.
The problem is that this one must be totally compatible
with the existing ones, because if not, it would let
unusable thousands of existing resources (it would be
necessary to modify as much Web servers as Web
clients).
2.2 Modification of protocols
Within this alternative exits the case of having a
new request scheme of Web resources. This new
18
scheme receives the name of Push, whereas traditional
scheme receives the name of Pull [2].
The Pull scheme receives the name of “over
demand’. Under this scheme, the client (user) is who
visualizes a resource in an explicit way. In our case, if
a user wants to see the page of cenidet, must write in
the Web browser the next URL:
http://www.cenidet.edu.mx/.
The Push scheme also receives the name of
'subscription-notification'. In this scheme, the user
subscribes itself to a service and when some event of
interest happens a notification is sent for alerting the
user about the event.
Generally these two schemes do not live on isolated
way. Hybrid schemes (Pull&Push) have been applied
in diverse existing services, so is the case of the
reception of SMS/MMS messages, where the send of
messages is Pull and the reception is Push, since it
notifies to users about the existence of new messages.
Another service that has made famous devices like
the Blackberry to become successful is the Push-mail
[3]. This service comes to solve the problem of email
visualization in mobile environments. Under the
traditional scheme of the electronic mail, for consulting
the email, a user must be connected all the time to
receive it. This originates great costs if the network
connection generates costs per time. With this new
scheme, the user is not connected to the mail server.
When a new mail in the server is received, it notifies
the client of the existence of the new mail and sends it
to the mobile client.
For this type of schemes, protocols like HTTPU
(HTTP over UDP) or HTTPMU (HTTP over multicast
UDP) have been proposed, and basically works similar
to the HTTP but using datagrams, which are not in an
oriented connection way. With these protocols are
possible to offer a better quality in the mobile Web [4].
2.3 Intermediary services
This is the more extended solution to solve the
problem of Web resources visualization and many
other problems present on Web, like the case of
firewalls that solve some of the Web security problems
like the access control, or proxies’ caches that tries to
reduce the access latency to the information.
The scheme of intermediaries is widely used
because it doesn’t need to modify neither the clients
nor the servers; in fact, the client and server processes
do not notice the existence of these intermediary
services. These services are in charge of the hard work
and are transparent to the users.
The tool that is described in this article, works
under the scheme of intermediary services.
3. Proposal of solution
The hoarding process solves the problem of Web
resources visualization without concerning the state of
the connection of the mobile device. For this, it
becomes necessary that the user has already stored, in
local way, in his device the resources that he o she will
use.
As can be observed, the amount of resources to
occupy can be immense, whereas the capacity of
storage of the devices is limited. In order to give
solution against this new problem is necessary to have
an effective way to know the resources that a user
could use. With hoarding is possible to reduce this,
through algorithms of association rules applied on Web
logs, is determined the optimal set of resources that
will be replicated to the mobile clients [5].
A mechanism which tries to solve the adaptation
problem of Web resources to the displaying capacities
on mobile devices is transcoding. It consists of
transformation of resources, distilling and processing
of all those characteristics that are not available in the
device is needed. The used mechanism of transcoding
uses HTML to a subgroup of HTML transformer,
using XML.
The system is based on client-server architecture
with an intermediate tier on the server side as on the
client side. The system is shown in Figure 1.
Figure 1. General architecture proposed.
The general system has been denominated GASWT
(Gestor de Acaparamiento de Sitios Web
Transcodificados: Hoarding Manager of Transcoding
Web Sites). The intermediary in the client side is
denominated GAP (Gestor de Acaparamiento para
Pocket PC: Hoarding Manager for Pocket PC),
whereas the server side is denominated GAT (Gestor
de Acaparamiento y Transcodificación, Hoarding
Manager and Transcoding). The GAT is composed by
MA (Mecanismo Acaparador: Hoarding Mechanism)
and by MT (Mecanismo Transformador: Transcoding
19
Mechanism). The communication between the
processes is made through a HTTP request-response
scheme.
As much the MA as TM are taken from other
projects that together with this one, comprise the
Moviware project [6], whose main function is to offer
a set of services to mobile clients that have frequent
disconnections.
The general operation of the system is described in
the next lines. The user introduces an URL from the
Web browser (which has been previously configured to
redirect his exit towards the GAP). The GAP receives
the request and determines if it is in the local cache of
the device, if found, the hoarded resource is sends to
the Web browser.
When the resource is not hoarded, the system
validates the connection existence in order to obtain
the resource on line. If for some reason the resource
cannot be shown, (because it doesn’t exist or has
detected an error in the connection) the system notifies
the user by sending an error message.
On the other hand, if the Web resource is not
hoarded and a pattern of the site in the local device
doesn’t exist, the MA sends the Web resources if a
pattern for this site exists. If the pattern exists but the
hoarded resources in the MA aren’t present, it obtains
them by requesting them to MT and soon compresses
the resources in zip format to optimize the process.
Once the MA has sent the hoarded Web site, the
mobile device must decompress the Web site and
update its list of patterns. This process happens in
transparent way, in a way that the user never notices.
MT is responsible of collecting documents and if
they are HTML, it transforms them if the configuration
parameters indicate that. The transcoding is made on
line, because the process is slowed down if the
document is too large.
The actions that the user can make on the system
consist in visualizing Web sites on line, visualizing
Web sites on disconnection mode, visualizing error
messages, visualization of the requests states and
finally, set up the system.
The GAP is basically conformed of three main
modules which are: Observer, GAL (Gestor de
Acaparamiento Local: Local Hoarding Manager) and
GDL (Gestor de Desconexión Local: Manager of Local
Disconnection).
The Observer is responsible of processing each
request and to give back the result to the navigator.
The GAL is responsible of the manipulation and
control of the cache in the device. The users decide
which resources are susceptible of hoarding, as well as
limiting the storage space.
The GDL is responsible of determining the state of
the connection. The control of the disconnections has
been used drilling the network during three seconds.
Observing the quality of the results, a threshold of 30%
of accepted connections determines if the client is
connected (if the threshold is surpassed or equaled) or
is on disconnection mode (if it is below the threshold)
[7].
For the implementation of this tool, we used .NET
Compact Framework 1.0 with C # language, because it
is the best option to program in Pocket PC platform
[8].
The modifications of the MA and MT are being
made in Java so that it is language in which these
modules are programmed.
4. Results
The tool described in the present document has been
proven in diverse equipment like Pocket PC 2000
(Compaq iPAQ H3630), Pocket PC 2002 (HP Jornada
5500), Pocket PC 2003 (HP rx3115), emulators of
Windows CE, desktop PC (Compaq Presario with
Pentium 4 1.4 Ghz. processor, 512 Mb of RAM
memory).
The first test scenario consisted of acceding to the
Web resources in on line mode. We obtained
satisfactory results (see Figure 2).
In the number two test scenario, the GAP was
executed without being connected to the network.
Additionally we had a pattern of a hoarded Web site
(http://www.cenidet.edu.mx/) and resources. In this
case not existing images in the original site were used,
because it was possible to verify that the hoarded
resources are correctly displayed.
The number three test scenario (see Figure 3),
demonstrates that it is possible to transcoding the
resources in the device as well as showing them in a
local way if they are hoarded and without transcoding.
It is Also possible to execute the GAP in other
platforms like Smartphones (SmartGAP) and a desktop
PC (WinGAP). GAP, WinGAP and SmartGAP are the
same program but with different name, to differentiate
the platforms in which they’re running.
5 Conclusions
With the presented tool is being demonstrated that it
is possible to execute complex services in Pocket PC
devices, so is the case of an intermediary service that it
allows to visualize Web resources when it exists or not
a network connection.
At this time we have verified in an isolated way
most of the functions of the system (it lacks the
methods of decompression of the hoarded site), it
20
would be necessary the respective integration of
components and testing to the system in its totality.
Figure 2. Case of test 1: Visualization of Web
resources with network connection.
Figure 3. Visualization of Web sites in
disconnection mode with hoarded Web resources
and without transcoding.
Figure 4. Case of test 3: Visualization of Web sites
in connection mode, with hoarded and transcoding
resources.
The expected benefits at the conclusion of this
investigation work are: 1) Visualization of Web sites
without mattering if the devices are connected or not.
2) Reduction of latency in the access to the
information, if the resource is hoarded locally. 3)
Energy Saving by the fact to work in disconnection
mode. 4) Saving money if the user decides not to
connect to a network that receives the service and
generates expenses by the access time. 5) Facility of
administration of Web sites when not having different
versions to each device.
6. Acknowledgments
We want to give thanks to Rocío Vargas Arroyo for
her contribution in correct this paper.
7. References
[1] SG magazine, http://www.softwareguru.com.mx [visited
march 2006]
[2] Purushottam Kuikarni, et al., “Handling Client Mobility
and Intermittent Connectivity in Mobile Web Accesses”,
Department of Computer Science, University of
Massachussets.
[3] Blackberry’s push technology,
http://www.blackberry.com/products/software/integrations/p
ush_email.shtml [visited march 2006].
[4] UPnP Forum, http://www.upnp.org/, [visited march
2006]
[5] David Valenzuela, “Mecanismos para predicción de
acaparamiento de datos en sistemas clientes/servidor
móviles”, masther thesis, cenidet, august 2002.
[6] Gabriel González. “Plataforma middleware reflexiva para
aplicaciones de cómputo móvil en Internet (Movirware)”,
cenidet.
[7] J. Carlos Olivares, et al, “Control de desconexiones en la
visualización de páginas Web en dispositivos móviles
Windows CE”, for appear in XVI CIECE’06, april 5,6 and 7
2006, Cd. Obregón, Sonora, México.
[8] Gabriel González, Azucena Montes, J. Carlos Olivares,
“Comparativa y evaluación de las herramientas de
programación para desarrollar aplicaciones en plataforma
Pocket PC”. VI CICC’05, Colima, Colima, México,
september 2005.
21
Evaluation of tools for business process in three levels of abstraction
Erika M. Nieto Ariza1
, Javier Ortiz Hernández1
, Guillermo Rodríguez Ortiz2
1
Centro Nacional de Investigación y Desarrollo Tecnológico
Interior internado Palmira s/n, Cuernavaca, Morelos, 62490 México
{erika, ortiz}@cenidet.edu.mx,
2
Instituto de Investigaciones Eléctricas
Reforma 113. Palmira, Cuernavaca, Morelos, 62490 México
gro@iie.org.mx
Abstract
Organizations are increasingly choosing the use of
the web to provide their services to their clients.
Services are the systemization of the business
processes in the organization. Due to the great number
of existing modeling methods and the increasing use of
internet, it is necessary to identify the information that
modeling methods allow to specify. In this paper, a set
of concepts is proposed to evaluate modeling methods
for business modeling using three levels of abstraction
–organizational, integration and web.
1. Introduction
Organizations should decide how the technology
systems support business and how increasingly these
information systems become an integral part of the
business processes [1, 2]. Models are commonly used
to flexibly represent complex systems and to observe
the performance of a business process when a
technology system is integrated [3, 4, 5]. A business
model is an abstraction of how a business performs, it
provides a simplified view of the business structure
which acts as the basis for communication,
improvement, or innovation, and defines the
information systems requirements that are necessary to
support the business. A model has to capture the
domain without reference to a particular system
implementation or technology. One of the problems
with modeling the early representations of business
processes, conceptual views of information systems
and Web interactions is the great number of techniques
to model and specify these models, and, additionally,
since each one has its own elements, this makes it
complex and laborious to compare and select the
appropriate technique to model a system in an specific
level of representation.
Three modeling levels of abstraction are proposed
which integrate a set of concepts to build early web
application models: a) Organizational, it describes how
the organization works and the business process that is
going to be systematized with a web information
system; b) Integration, it describes the role of the
software system and its integration with a particular
organizational environment; c) Web, it describes the
semantics of a web application [5,6]. The basis of our
contribution is in the identification and classification of
a set of concepts which are used to know what to model
at each level of abstraction and, to have a modeling
method evaluation framework to distinguish the
capabilities of each method in order to model at the
three levels of abstraction.
There are some methods and methodologies to
evaluate business process modeling; however, they do
not evaluate capabilities but rather the functionality of
the application or the modeling methods. Rosemman
proposes ontology to evaluate organizational modeling
grammars identifying their strength and weaknesses
[7]. Luis Olsina [8] and Devanshu Dhyani [9], propose
a methodology to evaluate the characteristics of a web
application in operational phases.
The structure of this paper is as follows: in section 2
the modeling concepts that comprise our approach are
briefly presented, in section 3 the modeling concepts
are enhanced with a set of aspects found to be useful in
building models and a method evaluation methodology
is presented, in section 4 the results of the evaluation
are shown, in section 5 the conclusions about the
benefits of the methodology are discussed, finally the
references are presented.
2. Modeling concepts
A business process model can be viewed at many
levels of abstraction, and complementary model views
can be combined to give a more intelligible, accurate
22
view of a system to develop than a single model alone
[3]. This approach establishes three levels of
abstraction and each one includes certain modeling
concepts of features as shown in table 1. At each of
these levels, concepts are properties or characteristics
that structurally describe types of requirements in a
specific level of abstraction; they define the key
elements in a business process. Concepts in each level
of abstraction were selected based on the analysis of
several techniques and methods for business process
modeling at the three levels.
Table 1: Modeling concepts at each level of abstraction
Organizationa
l level
Integration
level
Web level
Business process Pure navigation
--- Navigation page
- Relationship
User profile (Rol) User profile (Rol)
Actor Actor
Class (object) ---
Resource Artifact
Artifact Artifact
Goal Goal --- Goal
Task Function Service Service
Activity Event
Event ---
Business rule Constraint Precondition and
postcondition
---
Quality No functional
requirement
No functional
requirement
---
The organizational modeling concepts are as
follows.
- Actor. It describes an entity that has a specific goal in
the business process.
- Resource. It describes an informational or physical
entity that is transferred between actors.
- Goal. It describes a business process desired state that
an organization imposes to itself.
- Task. It describes a series of activities oriented to
reach a goal.
- Activity. It describes a set of actions to carry out one
task.
- Quality. It describes the desired characteristics in the
business process.
- Business rule. It describes the actions and criteria that
govern the execution of the business process.
The integration modeling concepts are as follows.
- Actor. It describes an entity that interacts with the
information system and that might play different roles.
- Artifact. It describes an entity that is transferred
between an actor and the information system.
- Goal. It describes the information system purpose,
limitations and responsibilities.
- Function. It describes a service that must be provided
by the information system.
- Event. It describes a change in the business process in
one specific moment of time.
- Constraint. It describes a condition for a service
execution supplied by the information system.
- Non functional. It describes the desired quality
features or constraints for the information system.
The Web modeling concepts are as follows.
- Navigation relationship. It describes a global vision of
the Web application according to a user profile.
- User profile. It describes the user unique use of the
Web application.
- Class. It describes an object type to model the entities
that integrate the application.
- Artifact. It describes an abstract object to be
transferred between the Web application and a user.
- Goal. It describes the purpose of the Web application.
- Service. It describes an activity or an action that the
web application has.
- Event. It describes the trigger of an activity or action
that might be carried out to obtain a result or artifact.
- Pre and pos condition. It describes the performance of
an event execution.
- Non functional requirement. It describes the desired
quality features or constraints for the Web application.
Each concept used for business process modeling is
related to each other.
3. The concepts and the evaluation of
methods approach
The last section introduced a set of modeling
concepts used to model business processes and systems
at different levels of abstraction. Here the concepts are
enhanced with aspects that make them more powerful
to model a particular view. These aspects are also used
as scales to evaluate modeling methods. These aspects
are capabilities sorted by the concepts presented before
and a scale is defined for each concept using the
capabilities related to the concept. Also, a desired
capability mentioned in the literature may be used in
the definition of a scale.
Following a well-known approach from the
economics and management disciplines, to each aspect
a scale between 0 and 5 is assigned which is going to
be used to evaluate one of the modeling capabilities. As
in the statistics methods, the concepts in this paper are
qualitative variable with a nominal scale [10]. The
evaluation scale is obtained by first taking a list of the
capabilities of one method, and then a list of
capabilities from a second method, from a third, until
all selected methods are analyzed. The concepts
evaluation scales facilitate the comparison of different
modeling methods capabilities (see Tables 2, 3 and 4).
The order assigned to the scales is intuitive and
relatively arbitrary; however, it can be changed easily.
23
Then each one information method is evaluated for all
the aspects in each level of abstraction.
Table 2: Aspects and evaluation scales for the
organizational level of abstraction
Table 3: Aspects and evaluation scales for the integration
level of abstraction
The evaluation consists in assign a value to each
concept of the method. For example, the concept non
functional requirement at the web level; if the method
has the non functional requirement concept; the method
should have 1 point. If the method in the non functional
requirement concept says who proposes it and to what
is applied, the method should have 2. If the method has
the concept of non functional requirement, who
proposes it and to what is applied, and also, the kind of
requirement, the method should have 3 points. If the
method has the concept of non functional requirement,
who proposes it and to what is applied, the type of the
requirement, and also, the measure to verify
compliance; the method should have 4 points. The
method should have 5 points if it has the concept of
non functional requirement, who proposes it, to what is
applied, the type of the requirement, the measure to
verify compliance and what happens if it is not
fulfilled.
Table 4: Aspects and evaluation scales for the Web level
of abstraction
3.1. Evaluation methods
The evaluators have to evaluate the three levels of
abstraction for all concepts. For each modeling method
and for each aspect ai, a corresponding evaluation ei is
obtained. The results are displayed in a table for easy
comparison and a total score is obtained for each
method and for each level of abstraction as Σei. A
method that scores better than other, possibly has more
capabilities to model requirements at the corresponding
level of abstraction than the first.
4. Results of the methods evaluations
As an exercise, the following methods i*, Tropos,
EKD, BPM-UML, OO-Method/OOWS, OOWS [5, 7,
4, 8, 9, 11, and 12] were evaluated using the scales
presented (tables 5, 6 and 7). The methods evaluated at
each level are not the same since some methods do not
offer the modeling concepts for the level where they
are not shown.
Table 5: Organizational level evaluation of the methods
Organizational
level
Max.
Value
I* Tropos EKD BPM-
UML
Actor 5 5 5 5 5
Resource 5 5 5 2 5
Goal 5 1 3 4 3
Task 5 2 4 3 2
Activity 5 0 2 0 4
Business rule 5 2 0 5 4
Quality 5 3 4 4 4
Total 35 18 23 23 27
Scale
Concept
1 2 3 4 5
Actor Actor --- Role Type Responsibilit
y
Resource Resource Type Actor using
it
--- Actor
supplying it
Goal Goal Priority Problem Opportunit
y
Verification
Task Task Who
requests
Who
executes
Hierarchy Associated
Goal.
Activity Activity Tasks
supported
Hierarchy How is
activated
When is
concluded
Business rule Business
rule
Associated
concept
Origin Type Hierarchy
Quality Quality Associated
concept
--- Origin Measure
Scale
Concept
1 2 3 4 5
Actor Actor --- Role Type Responsibility
Artifact Artifact Actor or
function
supplying
--- Actor or
function
requiring
Artifact state
Goal Goal Who
establish it,
Associated
to a
function
Assigned
priority
Measure,
Failure
cause
Opportunity to
solve a
problem
Function Function Who starts
it
Who uses it Hierarchy The product
Event Event Who fires
it,
What is the
start state,
What is
produced,
Hierarchy
Who
receives the
product,
Owner
function
Final state
Constraint Constraint Type Who defines
it
To who or
what applies
Who or what
enforces it
Non
functional
requirement
Constraint Who
proposes it,
To what is
applied.
Type of
requirement.
Measure to
verify
compliance.
What happens
if not fulfilled.
Scale
Concept
1 2 3 4 5
Navigation
page -
Relationship
Navigation
page
Nav. page -
Relationship
User Profile Navigation
help
Access
constraints
User profile
(Role)
User profile Role Role
changes
allowed
Services
per user
Business
process
state
Class
(object)
Class
(object)
Attributes Relationships Methods Type of
relationships
Artifact Artifact --- Type Supplier User
Goal Who
defines it
Associated
service,
Priority Measure Failure
cause,
Opportunity
to solve it
Service Related
events
Hierarchy,
Requesting
User
Executing
agent,
Result.
Result final
user
Owner page
Event Event Service
owner,
Hierarchy,
Implementin
g class
Who
requests
Shared or
not
Pre and
post
condition
Post
condition
Pre
condition
--- --- Associated
event
Non
functional
requirement
Non
functional
requirement
Who
proposes it,
To what is
applied.
Type of
requirement.
Measure to
verify
compliance.
What
happens if
not fulfilled.
24
Table 6: Integration level evaluation of the methods
Integration
level
Max.
Value
I* Tropos EKD BPM-
UML
OO-
Method
Actor 5 5 5 5 5 1
Artifact 5 5 5 4 5 4
Goal 5 1 3 4 3 1
Function 5 2 2 5 5 2
Event 5 0 1 0 4 3
Constrain 5 2 0 5 4 5
No functional 5 3 4 4 4 0
Total 35 17 20 27 30 16
Table 7 (a): Web level evaluation of the methods (business
process)
Nivel web Max.
Value
Tropos OO-Method /
OOWS
OOWS
User profile 5 3 4 4
Class 5 0 5 5
Artifact 5 4 4 4
Service 5 3 3 3
Event 5 1 3 2
Precondition and
post condition
5 2 5 3
No functional 5 3 0 0
Total 35 16 24 21
Table 7 (b): Web level evaluation of the methods (pure
navigation)
Nivel web Max.
Value
Tropos OO-Method /
OOWS
OOWS
Navegational page –
relationship
5 1 5 5
User profile 5 3 4 4
Goal 5 3 0 0
Artifact 5 4 4 4
Service 5 3 3 3
Total 25 14 16 16
At organizational level, BPM-UML obtains good
scores for this level of abstraction, and i* has the
lowest score. The methods were evaluated with respect
to the parameters defined for the approach presented
here. During the evaluation of methods, their own
characteristics are shown, for example, the quality
aspects of a business process are modeled as qualitative
goals using BPM-UML. At integration level, the result
shows the capacities of each method, for example,
BPM-UML obtains good scores for this level, but OO-
Method has the lowest score.
5. Conclusions
There are many proposals to model the
organizational, integration and web requirements and
each one has its own elements. Some use the same
concepts but the names are different, which makes it
complex and laborious to compare the methods. The
approach presented here unifies the various
terminologies, increases the knowledge about modeling
concepts, and proposes an evaluation approach for the
methods modeling capabilities and techniques. This
helps to select the method that is more appropriate to
the needs of a problem domain. The approach has been
used to evaluate e-learning systems [13]. Additionally,
it has been applied in the development of various case
of studies to evaluate virtual reality methods and to
clearly appreciate the concepts that the methods allow
to model.
6. References
[1] James Pasley,: “How BPEKL and SOA are changing web
services development”, IEEE Internet Computing. May –
June 2005.
[2] Peter F. Green, Michael Rosemann y Marta Indulska,:
“Ontological Evaluation of Enterprise systems
Interoperability Using ebXML”, IEEE Transactions on
Knowledge and Data Engineering, Vol 17, No. 5, IEEE
Computer Society, may 2005.
[3] Mersevy T. and Fenstermacher K.,: “Transforming
software development: and MDA road map”, IEEE
Computer Society, September 2005.
[4] H. E. Eriksson and M. Penker, Bussiness process
modeling with UML, Chichester, UK, Wiley Editorial, 2000.
[5] E. Yu,: Modelling Strategic Relation for Process
Reengineering, Universidad de Toronto, Canada, 1995.
Thesis submitted for the degree of Doctor of Philosophy.
[6] A. Ginige and S. Murugesan,: “Web Engineering: An
Introduction” IEEE Multimedia, pp 1-5, Jan-Mar 2001.
[7] Peter F. Green, Michael Rosemann y Marta Indulska,
“Ontological Evaluation of Enterprise systems
Interoperability Using ebXML”, IEEE Transactions on
Knowledge and Data Engineering, Vol 17, No. 5, IEEE
Computer Society, may 2005.
[8] Olsina, Luis A., Metodología cuantitativa para la
evaluación y comparación de la calidad de sitios web. Tesis
doctoral. Facultad de Ciencias Exactas, Universidad
Nacional de La Plata, noviembre de 1999.
[9] Devanshu Dhyani, Wee Keong Ng, and Sourav S.
Bhowmick,: A survey of web metrics, ACM computer
survey, Vol 34, No. 4. December 2002, pp. 469-503.
[10] William L. Carloson and Betty Thorne, Applied
Statistical Methods for business, Economics, and the Social
Sciences. Prentice Hall, 1997.
[11] Bubenko J., Brash D. y Stirna J.: EKD User Guide,
Royal Institute of technology (KTH) and Stockholm
University, Stockholm, Sweden, Dept. of Computer and
Systems Sciences, 1998.
[12] E. Insfrán, O.Pastor and R. Wieringa: “Requirements
Engineering-Based conceptual Modelling”, Requirements
Engineering Springer-Verlang, vol. 2, pp. 7:61-72, 2002.
[13] Eduardo Islas P., Eric Zabre B. y Miguel Pérez R.: “Evaluación
de herramientas de software y hardware para el desarrollo de
aplicaciones de realidad virtual”,
http://www.iie.org.mx/boletin022004/tenden2.pdf (2005).
27
Image Segmentation Script Language
Francisco Cervantes Álvarez, Raúl Pinto Elías
Centro Nacional de Investigación y Desarrollo Tecnológico (cenidet)
Interior Internado Palmira s/n, Cuernavaca, Morelos, México.
{cervantes04c, rpinto}@cenidet.edu.mx
Abstract
In this article we propose the use of a script
language to the image segmentation stage in artificial
vision. Here the proposed language, the system
architecture to interpret scripts and the general
structure of the programs that integrate the operator
library are describing. Finally, some tests and results
of the use of proposed script language are shown.
1. Introduction
In this paper we propose the use of a script language
for image segmentation. Nowadays, the use of script
languages in the graphic programming is increasing,
because these allow testing the ideas on an easy way
[1]. Also, script languages easily allow the code reuse
[2]. However, in the artificial vision area few works
focused on digital image processing by script languages
exist, an example is shown in [3], where the user make
a script with graphic objects, then they execute the
script to process a given image. An example of a
commercial script language is MATLAB [4].
The proposed script language, allow proving ideas
of image segmentation on an easy way and the user do
not need to know how the segmentation algorithms
makes the process. Also, the language allows the code
reuse by the operator library (the operators are
independent to each other) and the implementation of a
script interpreter. The library above mentioned allows
that the language can grow of a modular way and
without need to modify the existent code.
This paper is structured of the following way. In the
second section the basic elements of the proposed
script language are described. The third section shows
the basic structure that should have the library
operators. In the fourth section the general scheme of
the script language interpreter is described. Finally, in
the fifth section some tests and results are shown.
Lastly the conclusions are shown and some future
works are commented.
2. Basic elements of script language
The language is composed of the definition of the
following data types: Entero, Real, Cadena,
ImagenGris, Mascara and Contorno. Also the basic
arithmetic operations are defined (addition, subtraction,
multiplication and division). The language have the
following basic structures:
• Declaration.
• Assignment.
• Operator Call.
The corresponding syntax to above structures are
the following.
Declaration:
data_type (variable_name) (, variable_name)*
Assignment:
variable_name = variable_nameX
variable_name = arithmetic_expression
variable_name = operator_name (arguments)
Operator Call:
operator_name (arguments)
The syntax above mentioned provides a general
structure, now all depends on the registered operators
in the operator library. This structure is named
language core. The script language interpreter is very
important because it let recognizes new operators. This
way of language definition, where only the structures
are established but the elements language are not
defined, give the advantage of adding new elements or
commands in a dynamic way, without modifying the
core code. However, each operator is independent to
each other. The single restriction to add an operator to
the library is following a basic structure specification.
This restriction must be followed in order to let the
core and the operators interact.
28
3. Operator basic structure
An operator can be used as a part of the library, if
the operator has a general structure like the structure
shown in figure 1; it can be used as a part of the library.
Core libraries for data type manager
(numbers, strings, images and
templates).
Request of arguments (file paths
where are the content of the
parameters).
Operator body (PDI algorithm).
Return the result (the result is save in
the last argument by a file).
Figure 1 Operator general structure
In the figure 1, the structure that should have the
operators is shown. Some structure elements can be
ignored, for example, in the header only those only
those required core libraries must be included. An
operator can return a value, but this condition is not
absolutely necessary, for example, the operator for
showing an image only displays the image in the screen
and it does not need returning anything to the core. By
default is necessary that the operators receive at least
one argument.
In order to interact the core and operators must be
used the defined data types which are in the core.
Also, the parameters that an operator need for its
execution have to be received by the file path
specification. In these files is the content of the
operator parameters. Finally, also, is necessary to save
the output data in a file which is specified by the last
parameter that is received by one operator. Below, an
example of an operator to extract the negative of an
image is shown.
#include "CImagGris.h"
AnsiString CharToAnsiString(char *arreglo);
int main(int argc, char* argv[])
{
if ( argc < 3 ) exit ( ERR_NUM_PARAM ) ;
char * a_entrada = argv [ 1 ];
char * a_salida = argv [ 2 ];
AnsiString entrada,salida;
CImagGris Imagen;
entrada=CharToAnsiString(a_entrada);
salida=CharToAnsiString(a_salida);
if(Imagen.leerArchivo(entrada)==false)
exit(ERR_IMAG_EN);
int x,y,h,w;
Byte pixel;
h=Imagen.Alto();
w=Imagen.Ancho();
for(x=0;x<w;x++)
{for(y=0;y<h;y++)
{ pixel=Imagen.getPixel(x,y);
Imagen.setPixel(x,y,255-pixel);
}
}
if(Imagen.escribirArchivo(salida)==false)
exit(ERR_NO_MEMO);
exit ( BIEN ) ;
}
Right now the interpreter core can only support
images in BMP format of 24 bits.
4. General scheme of script interpreter
The general structure has been shown. Now, the
general scheme of the script interpreter is presented. In
the figure 2, each component of the interpreter and the
relationship between operators are shown.
Figure 2 Script interpreter general scheme
By this structure the system first explore the
operator library to generate the structure of each
operator (syntax, semantic), later it analyze the input
script and execute the operators.
5. Test and results
Several tests have been done to show the advantages
that provide the script language use on image
segmentation. For example, the user can to use this
language without a direct interaction with the
algorithms.
The first test consists of showing how to make a
new operator that convert an image to binary image. To
Operador
library
Interpreter
Process module
Memory module
Data types
Interpreter core
Script Result
29
create an operator is necessary to have a Builder C++
compiler. In the figure 3 the operator code is shown.
Figure 3 Operator to convert an image to binary image.
Can be saw in the above figure that making a new
operator is very easy, only is necessary to follow the
general structure that has been specified. Now, only the
user has to generate the executable file by the
compilation of the source code. In this moment the new
operator has been created and must be registered in the
library. The second test consists of registering the new
operator. For this the Métodos option of Herramientas
menu is used. In the figure 4 the menu is shown.
Figure 4 Script interpreter interface
When the user click in this menu the registration
screen is displayed, here the new operator have to be
registered, this is shown in the figure 5. Now, the user
has to indicate the executable file path and to assign an
alias. The alias is used by interpreter language, later the
input and output parameters of the operator are
specified.
Figure 5 Screen to operator register
Once the operator is registered; this one can be used
as a part of the language, in the figure 6 show the above
mentioned process. Here, a script to convert images to
binary images is made, and also show the initial and
final image.
With these tests the advantages of the script
language are shown. By the capability of registering
new operators provide an open language to increase
based on the needs of the user. Nowadays the language
has 25 segmentation operators and 5 image description
operators. In the test 3 is shown how the operators
BinarizarImagen and VerImagenGris interact one to
the other. These operators can be saved to be later
reused. Finally the manager of the library is simple
because the interface provides screens to modify,
remove and to add operators to the library.
6. Conclusions
With this paper we can conclude that the use of
script language to image segmentation is practical. In
this work we can saw that the segmentation algorithm
functionality can be absent-minded by using scripts,
then the user do not need to know the algorithms.
Finally, we can say that the script languages might be
used in others stages of the artificial vision.
The operator library used by the interpreter is built
for growing in a modular way. The library grows with
each operator that is registered.
#include "CBMP24.h"
#include "CNumero.h"
int main(int argc, char* argv[])
{
if ( argc < 4 ) exit ( ERR_NUM_PARAM ) ;
char * a_imag = argv [ 1 ];
char * a_umbral = argv [ 2 ];
char * a_salida = argv [ 3 ];
int valorUmbral;
AnsiString imag,umbral,salida;
CIMAGEN_BMP24 Imagen;
CNumeroMemoria UmbralBinario;
imag=CharToAnsiString(a_imag);
umbral=CharToAnsiString(a_umbral);
salida=CharToAnsiString(a_salida);
if(Imagen.leerArchivo(imag,GRIS)==false)
exit(ERR_IMG_EN);
if(UmbralBinario.leerArchivo(umbral)==false)
exit(ERR_UMB_EN);
valorUmbral=(int)UmbralBinario.Valor();
int x,y,h,w;
Byte pixel;
h=Imagen.Alto();
w=Imagen.Ancho();
for(x=0;x<w;x++)
{
for(y=0;y<h;y++)
{
pixel=Imagen.getPixelGrey(x,y);
if(pixel>=(Byte)valorUmbral)
Imagen.setPixelGrey(x,y,255);
else
Imagen.setPixelGrey(x,y,0);
}
}
if(Imagen.escribirArchivo(salida)==false)
exit(ERR_NO_MEMO);
exit ( BIEN ) ;
}
Figure 6 Script to convert an image to binary image
30
7. References
[1] M. Villar, “Guía de lenguajes de script para prototipazo
rápido”, http://www.codepixel.com/tutoriales/prototipado/,
2006.
[2] K. Muehler, “Adaptive script based animations for
medical education and intervention planning”,
Department of Simulation and Graphics, University of
Magdeburg, Germany.
[3] “Sistema interactivo para la enseñanza de la visión
artificial”, Depto. de Sistemas Inteligentes Aplicados,
Escuela Universitaria de Informática, Universidad
Politécnica de Madrid, 2006.
[4] “MathLab”, http://www.mathworks.com/, 2006.
31
Web Page Retrieval Using an Ontology that is Populated by Automatic Text
Classification
Ismael R. Ponce M., José A. Zárate M., Juan C. Olivares R.
Centro Nacional de Investigación y Desarrollo Tecnológico
{rafaxzero04c, jazarate, jcolivares04c}@cenidet.edu.mx
Abstract
In this article is described a proposal to help users
in the arduous task that means recovering information
from the Web, specially when queries are about a
subject or specific approach. For this, we suggested
the use of an ontology whose instances are Web pages
links about the domain on which the ontology was
constructed, taking advantage of the order and
categorization that it offers, to guide the user through
the concepts that integrates it and find information
related to them. The creation of an ontology about a
particular domain and the necessary activities to get
an automatic classification of the Web pages like
instances in the ontology are described.
Keywords: Ontology, automatic classification
methods, vector space model.
1. Introduction
Nowadays in agreement with the technological
evolution, the amount of information that is generated
every second is incommensurable, and not only that,
also the importance of having it has taken such
importance, so now we live in an era where the
information governs the world and its decisions.
Internet has become a great source of information,
but while greater is it, is more difficult to find the
desired content. Diverse ways have treated to recover
information, for example, the Web search machines,
that uses different techniques to recover it (searchers
like Google, Yahoo, Ask, Vivisimo, and many others),
some ones considering the popularity of the pages, the
use of clustering, etc; nevertheless, although somehow
they help at the time of making queries, users still face
against results not at all wished.
Therefore, diverse ways are treating to help to
search in Internet, that go from the concordance of
words to techniques based on the popularity of the
sites, unfortunately for many users, this type of results
are not enough to them, so that they require more
specific solutions.
The proposed alternative for this problem is to use
the paradigm of the ontologies for Web pages search
on a particular subject. When working on a concrete
domain, a specialized search is expected, in addition,
thanks to the use of ontologies and the order they
provide on the concepts that conform them, suppose a
great help for users to find the information they wish.
In this document we focused in the way to be able
to populate an ontology with Web page links, using
techniques of automatic classification. In our
experiment we were able to report an 86% of well
classified elements.
The article briefly describes the followed steps to
take to the practice the proposed idea. First, is
mentioned a brief panorama of the way to recover
information by some search machines, next includes
the development of a compatible ontology for the use
that are hoped to give it, concluding with the steps for
the use of an automatic classification method that will
be used to populate the ontology, considering the
Naive Bayes, k nearest neighbors and support vector
machines methods.
2. Search Machines
In a traditional search machine the queries are made
generally from key words, obtaining by result a listing
of Web links that are related about the asked words.
Some of the most known search machines are Google,
Yahoo, MSN Search, among others.
The case of Google emphasizes by the use of its
denominated PageRank technology [1], in which is
used a formula that calculates the weight of each Web
page that is stored in its data base, considering the
amount of links that other pages make reference to it.
The greater amount of links to a page, greater is its
score, becoming thus a popularity contest [2].
Unfortunately has been demonstrated that the results
32
can be manipulated by the well-known Google
bombing [3].
Another way to recover information is through Web
directories, which consist of a manual organization and
classification of Web pages, by subjects or categories.
One of the most representative directories is the Open
Directory Project [4], in which a set of voluntary
publishers are the ones who are in charge to list the
Web links inside an ontology, where the links are
grouped by similar subjects in categories. The
disadvantage that can be appreciated is that it requires
too much human intervention to be able to register the
links of the pages.
A special type of search machines are those that
incorporate clustering; the clustering consists of
partitioning a set of similar objects in subgroups,
where the elements that conforms each subgroup, share
common characteristics. This type of search machine
gives back the results that find for a query
accommodated in groups; examples of this are
Clusty.com and Vivisimo.com.
Finally, we found specialized search machines,
which are centered in recovering links of technical and
scientific documents. For example Citeseer, that is a
search machine of documents focused on the
computation, and that uses the bibliographical
references to consider the importance of the documents
that are queried.
Although these and other techniques have been
developed to recover information from the Web, this
area still has much to offer, reason why new
alternatives to help the user are continued looking for,
as is our case.
3. Phase of Ontology Development
Gruber [5] defines an ontology like the explicit
specification of a conceptualization, which means to
identify the concepts that integrate a domain and the
interrelations which exist among them, in a formal
representation, in a way it could be possible to share
and to reuse it.
The standard language established by the W3C to
make this type of formalizations is the OWL (Web
Ontology Language). The use of this standard in
addition to the advantages that allows its reusability by
others, is that to many tools related to the ontology
design and use, are become developed to support it,
like editors, reasoners, etc.
We developed an ontology, considering such points,
along following the proposed methodology by Uschold
and King [5]. The domain on which the ontology was
developed, was the natural language processing (NLP).
In order to develop it, we used the ontology publisher
Protégé 3.1.1, the Protégé-OWL 2.1 plug-in, along
with the OWL Wizards plug-in, in addition to the
FaCT++ 0.99.6 and RacerPro 1.9.0 reasoners, used to
verify the ontology consistency.
The classes are made up of concepts related to the
NLP area, including some ones like investigators,
schools, tools and application areas, mainly.
4. Supervised Learning for Automatic Text
Classification
Once developed the ontology, it continues the phase
to populate it with instances. Given the coarseness of
pages that exist in the Web, a manual classification of
these in the classes established in the ontology would
be an expensive task, and also this is already done in
great measure in the Web directories. Therefore a way
to be able to automate this process was looked for,
recurring to the supervised learning, in which by means
of statistical and mathematical techniques, an
automatic text classification can be done.
This approach is centered in having a document
training set, previously classified, that will be used to
learn to classify new documents. For it, is necessary to
transform the initial state of the documents to a
representation that can be used by a learning algorithm
for the classification.
For test aims we only worked with HTML pages.
Next are mentioned the necessary steps for make this
process. The used training collection consisted of 1624
documents, previously classified in 26 classes taken
from the developed ontology, in addition to a
denominated null class, in which are classified the non
wished documents for the ontology domain. The 26
considered classes are only a representative sample of
the existing classes in the ontology and were taken
only for test aims.
4.1. Document Preprocessing
All the elements (more precisely the words) that
appear in documents are not useful for their
classification, this is, there are words that by
themselves do not say anything about the document’s
content in which they are, and therefore, they can be
eliminated; among this elements are included the
punctuation marks and the HTML labels; also appear
words of very frequent use, words that appear in a
great amount of documents, which causes that their
discriminatory power is very low; these type of words
are known like stopwords, examples of them are the
articles, pronouns, prepositions, conjunctions, among
others.
33
In order to define the stopwords to eliminate, we
recurred to lists available in DOCUM [6], in SMART
[7] and in the Institut interfacultaire d'informatique of
the University of Neuchatel [8], as well as other words
identified during the process of tests.
Because being working on a specific domain,
certain control exists on the terms that belong to it,
reason why was suggested a matching between
different terms that refers about a same concept,
turning them into a single representation, in other
words, if for a concept is possible to be called in
different forms, we considered to unify them and
consider them under a unique form, inside the
classification process. For this, we follow like starting
point the concepts that integrate the ontology.
Finally, many words have the same lexical root; a
basic process of stemming based on Porter’s Algorithm
was followed [9], with which was looked for to reduce
words to their stem.
All the previously steps mentioned above, have like
aim to diminish the size of the training document
collection to make it more manageable, eliminating the
irrelevant parts to continue the automatic classification
process. In our exercise, in average we get to reduce
until a 70% the original size of document collection.
4.2. Vector Space Model
The vector space model (VSM) was proposed by
Salton in 1975 [11]. The basic idea behind this model
is to make a matrix that represents the documents and
the words contained in them, assigning a weight to
each word. Each vector that conforms the matrix
represents a document and the distribution of the words
that appears in it. It is a matrix of m x n, where m
represents documents and n represents the registered
words.
There exist different types of weighting for words in
the VSM; we considered in our tests boolean weighting
(weight of a word is 0 if it not appears in the document,
and 1 if it appears), weighed by frequency of
appearance (the weight of the term depends on the
amount of occurrences the word has in the document)
and finally tf-idf weighed (that is calculated
considering the average of the term frequency against
its inverse document frequency [11]).
4.3. Dimensionality Reduction in the VSM
All the words that integrate the training collection
cannot be considered in the VSM, since the dimension
that it would have would be enormous. Different
techniques to reduce the dimensionality exist, like the
documental frequency, which considers a minimum
value of appearances that must have each word within
the total of documents, to discriminate those words
whose appearance is very small and to leave those that
present a greater documentary frequency.
Another technique that was considered was the
information gain (IG), which calculates the difference
in the entropy of the system against the entropy of each
word. This difference, measured in bits, indicates how
relevant and with how many information contributes a
word in the whole collection, like determining factor to
carry out the classification.
The amount of total words that conforms the
training collection already processed is of 2552196
words, being between these only 125891 different
words. As it is possible to be appreciated, the amount
of different words is too huge to be handled in the
VSM, reason why only were considered those words
that passed a documental frequency with a greater or
equal value to 15, passing this a total of 8966 words,
which represents a 7.12% of the original total words;
nevertheless, it still is a very huge amount, reason why
the IG was applied on these words.
The calculated entropy of the total collection were
of 3.97; the considered words were whose who had an
IG equal or superior to 0.1, being in 527 different
words, a 0.42% of the original size. The words (already
stemmed and standardized) that greater IG presented,
were: nlp (0.552), natural_language (0.479),
knowledge (0.424) and data_min (0.335).
4.4. Automatic Classification Algorithms
Once obtained the VSM representation of the
training document collection, a method for automatic
classification can be applied to classify new elements.
The automatic classification methods we considered
were the Naive Bayes, k-nearest neighbors (kNN) and
support vector machines (SVM), thus to make a series
of tests to find the method that better results gives,
considering in addition the weightings mentioned in
section 4.2. WEKA was used to carry out the tests; the
results shown in Table 1 correspond with the use of the
10 fold cross validation, showing the percentage of
well classified elements.
Tabla 1. Percentage of well classified elements.
booleano tf tf-idf
NaiveBayes 62.7463 55.8498 81.2192
kNN 84.5443 85.0369 84.4212
SVM 86.2685 66.7488 82.0813
34
Figure 1. Classification test results
The best result was obtained with the boolean
weighed using support vector machine algorithm, the
details of this classification are in Table 2.
By using the training that better result offered, is
possible to classify new documents that could be
obtained from the Web, following the steps for the
VSM preprocessing and representation; once obtained
the class to which a document belongs, the link
direction of the page could be saved like instance of
the ontology.
Tabla 2. VSM results with boolean weighed
Well classified elements 1401 86.27%
Wrong classified elements 223 13.73%
Root mean squared error 0.1829
5. Conclusions
From the obtained results, it was decided the
utilization of boolean weighed along with the use of
the support vector machine method; the use of the
boolean weighed is justified because it is possible to
consider that the mere appearance of a word in a
document is a good indicator of its discrimination
value, specially when considering words not so
common in the ordinary speech, but that are relevant
for the domain on which they works, as demonstrates
the fact that great part of the words with greater
information gain belonged to the domain of the NLP,
on which was made this work.
With the obtained results of the words with greater
IG and the obtained exactitude of 86% in the best case
by the selected classification method, is a good
incentive to consider the use of the automatic
classification to populate the ontology.
With respect to the future work, once populated the
ontology, continues the process to guide the users
through the ontology so that they could find links
related to the queries made on the domain of the NLP.
When concluding this work, between the wanted
benefits, are let guide the user through the ontology
classification and its different relations, doing use of
the bounded natural language technique, to verify the
hypothesis that if exists ordered elements, is simpler to
find the wished ones and to save work to users while
showing them only thematic documents related to the
concepts on their queries.
6. References
[1] S. Brin, L. Page, “The Anatomy of a Large-Scale
Hypertextual Web Search Engine”, Computer Science
Department, Stanford University, Stanford, disponible en
línea: http://www.db.stanford.edu/~backrub/google.html,
visited on December 2005.
[2] M. Miller, 501 Web Site Secrets: Unleashed the Power of
Google, Amazon, eBay and More, Whiley Publishing, Inc.,
USA, 2004.
[3] Google bomb, Wikipedia, the free encyclopedia,
disponible en línea: http://en.wikipedia.org/wiki/
Google_bomb, visited on March 2006.
[4] ODP – Open Directory Project, http://dmoz.org/, visited
on April 2006.
[5] T. Gruber, A Translation Approach to Portable Ontology
Specifications, Knowledge Acquisition, 1993.
[6] M. Uschold y M. King, Towards a Methodology for
Building Ontologies, Workshop on Basic Ontological Issues
in Knowledge Sharing, 1995.
[7] DOCUM, a multilingual stopword file for CDS-ISIS,
http://library/wur.nl/isis/docum.html, visited on November
2005.
[8] ftp://ftp.cs.cornell.edu/pub/smart/, visited on November
2005.
[9] J. Savoy, CLEF and multilingual information retrieval,
Institut interfacultaire d'informatique, Universidad de
Neuchatel, Suiza, 2005, http://www.unine.ch/info/clef/,
visited on November 2005.
[10] The English (Porter2) stemming algorithm,
http://snowball.tartarus.org/algorithms/english/stemmer.html,
visited on December 2005.
[11] G. Salton y M. J. McGill, Introduction to modern
information retrieval. McGraw-Hill, 1983, EUA.
0
10
20
30
40
50
60
70
80
90
100
Naïve Bayes kNN SVM
Classification algorithm
boolean tf tf-idf
35
Neuro-Symbolic Hybrid Systems
Vianey Guadalupe Cruz Sánchez, Gerardo Reyes Salgado, Osslan Osiris Vergara Villegas
Centro Nacional de Investigación y Desarrollo tecnológico (cenidet)
Interior Internado Palmira S/N, Col. Palmira. C.P. 62490.
Cuernavaca Morelos México.
{vianey,osslan}@cenidet.edu.mx
Resumen
Actually, the Hybrid Systems (HS) approach is very
used to solve problems where different knowledge
representations are involve in one system. This
integration has the advantage to compensate the
weakness of one or another system complementing
their strengths. The Neuro-Symbolic Hybrid Systems
(NSHS) arise of the HS as an approach that offer the
possibility to implement robust systems where the
connectionist and symbolic nature are present in their
environment. The reason of the NSHS study is to
implement them in the artificial vision system process,
such that we can propose different strategies of
solution among different representations type involve
in this process, for that the Hybrid Systems
development cycle and the NSHS classification criteria
have a very important role in the definition of these
strategy.
1. Introduction
In last decade, it was very common working with
only one knowledge representation type. Even, there
had been one competition to demonstrate the
representation used was better than other that solved
the same problem. However, with the past of time the
researchers observed the weakness of each
representation and the complementary properties that
existed among those, the scientific community decided
to prove their capacities integrate them in one system
(Hybrid), in this integration better results were obtained
than the results obtained using an individual way.
The hybrid approach is inspired in the natural
mechanism, where: according to [1], we are
processing’s machine of hybrid information, our
actions are government by means of the combination of
the genetic information and the information acquired
by means of the learning. Due to this combination we
have the possibility of use different processing’s
method in complex and changing environments
successfully.
Under this natural scheme, the hybrid systems have
arisen as a new way to give solution to complex
problems where are necessary several knowledge
representations to use the information coming of the
environment, this environment determine the strategy’s
that should be used to increment the knowledge and
develop systems more robust [2].
However, the design and development of these
systems is difficult due to the big number of pieces or
components that are involve and the different
interactions among them [3].
The tendency is the study and construction of
hybrid systems whose strategy of solution may be the
best to solve the problem. In this paper we present a
study of the process that involve the desing and
development of a hybrid system as well as the criteria
for the clasification of one particular type of hybrid
system (HS) the Neuro-Symbolic Hybrid System
(NSHS), this has been used in applications such as: the
object recognition.
2. Design and development of a hybrid
system
In [1] propose Hybrid Systems development cycle,
in which present the process for the construction of this
system. The importance of this model are the bases on
which one NSHS may support its design.
2.1 Hybrid System development cycle
A structured approach like [1] can reduce the
development time and cost of a HS. The stages for the
construction of inteligent hybrid systems are: problem
analysis, matching, selection of the hybrid category,
36
implementation, validation and maintenance (see figure
1).
Figure 1. Hybrid system development cycle.
a) Problem analysis.
This stage involves the following steps:
1. Identify any sub-task existent in the problem.
Identificar cualquier sub-tarea existente en el problema.
2. Identify the properties of the problem. If the
problem has sub-task, this involve identify properties
of them.
b) Matching property.
Involve the matching among the properties of the
available techniques with the requirements of the
identified task.
c) Hybrid category selection.
In this phase we select the hybrid system type
required to solve the problem. This phase use the
results of the previous stages of problems analysis and
the matching property.
d) Implementation.
In this stage the developer will be in the position of
select the programming’s tool and the environment
necesary to implement the hybrid system.
e) Validation.
This phase is used to prove and verify the
performance of the individual components of the
application and the whole hybrid system.
f) Maintenance.
The performance of the hybrid system should be
periodically valued and refine it as it is necesary. The
maintenance is very important for adaptatives systems,
(i. e, neural networks).
3. Neuro Symbolic Hybrid Systems
The NSHS are systems formed by two or more types
of knowledge representations, one in connectionist way
and other in symbolic way. Both representations have
one group of qualities, these integrated in one system is
extremely powerful to solve complex problems.
Artificial neural networks are a type of connectionist
knowledge representation inspired in the functionality
of the biological neuron. This representation type has
been used for its learning capability and generalization
of knowledge, it being one very powerful tool to solve
complex problems of pattern recognition.
For other hand, one symbolic representation in
format of logical rules is based in the capability that the
human has to express the knowledge in natural way.
The last thing is very powerful to insert the knowledge
of a human expert to one system, as well as explain the
problem. This representation type has been used widely
in areas such as: pattern recognition, natural language
processing, robotics, computer vision and expert
systems.
Both types of knowledge representations are
combined in one system to suppress the disadvantages
of one or another representation and take advantage of
their integration.
The integration of the NSHS and its future
application in the design of an artificial vision process,
is very important know the criteria used for the NSHS’s
classification, due to this classification is obtained a
wide view of the different behaviours that may has one
NSHS.
37
4. NSHS Classification
In order to classify the NSHS are considered many
criteria [4]. Next, we explain shortly each one.
Tabla 1. Criteria to classify NSHS.
a) Integration type
Neuro-symbolic integration can be classified in
mainly three groups, according to the “hybridation
type” of the approach used.
• Unified approach. Attempt integrate the symbolic
systems properties into connectionist systems and vice
versa.
•Semi hybrid approach. This approach is used to
achieve translations. For example, the compilation of a
rules base in a net (knowledge insertion) and the
explicitation of rules starting from a net (knowledge
extraction).
•Hybrid approach. In this approach type may exist
many symbolic and connectionist modules integrated to
each other.
b) Couple grade
Define the interaction force between two modules.
The classification of different grades is carry out
through a progressive level that go since a extreme to
another. This classification consists of three levels:
• Weak couple. In this architecture type, the
different modules are connected by a simple relation of
input/output, and the communications are
unidirectional.
• Medium couple. In this category, the interactions
among modules are more flexibles, due those are
bidirectional; It doesn’t treat simply of input/output
relationship but rather each one the modules can
influence on the operation of the another.
• Strong couple. In these systems the knowledge
and data are not only transferred, also are shared
among modules through internal structures in common.
c) Integration mode
Represent the reason why the neural module and
symbolic module are configured in relation of one to
other and the full system.
•Chain. Two modules operate in sequence. One is
the main processor and it is assisted by another module,
acting like pre or post processor. The relationship
among modules is input/output.
•Sub-treatment. In this integration mode, one
module is subordinate of another to achieve some
function. The main module decides in what moment
call it and how use its output.
•Meta-treatment. One module solves the problem
and the other play a meta-level role such as carry the
control or improving results.
•Co-treatment. Both modules are the same in the
problem solution process. For example: one module
solves one specific problem and other module solves
the rest of the problem.
d) Knowledges tranference
The knowledge’s transference may be classified
according to the direction of the interchange.
•From symbolic to connectionist. The symbolic
knowledge is transferred since one symbolic module
and it is integrated to one connectionist module(S
→C).
• From connectionist to symbolic. The knowledge
acquired by learning in connectionist net may be
explained in symbolic rules way (S→C).
•Bilateral transfer. The knowledge can be
transferred in both senses: symbolic and connectionist
(S↔C). Usually include compilation mechanism and
rules extraction starting from the nets.
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish
Lookingforwardenglish

Mais conteúdo relacionado

Mais procurados

HAN_XU_ICDMW2014
HAN_XU_ICDMW2014HAN_XU_ICDMW2014
HAN_XU_ICDMW2014Han Xu, PhD
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontologyIJwest
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelMihika Shah
 
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...IOSR Journals
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning usingIJwest
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extractionunyil96
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in theIJwest
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityLawrie Hunter
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianTraian Rebedea
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Zainul Sayed
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineeringbutest
 
Semantic technologies at work
Semantic technologies at workSemantic technologies at work
Semantic technologies at workYannis Kalfoglou
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
download
downloaddownload
downloadbutest
 

Mais procurados (19)

HAN_XU_ICDMW2014
HAN_XU_ICDMW2014HAN_XU_ICDMW2014
HAN_XU_ICDMW2014
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontology
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object model
 
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
Resume parser
Resume parserResume parser
Resume parser
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in the
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object Comprehensibility
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 
Analysis Report
 Analysis Report  Analysis Report
Analysis Report
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
 
Spagnolo ph day2011
Spagnolo ph day2011Spagnolo ph day2011
Spagnolo ph day2011
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
Semantic technologies at work
Semantic technologies at workSemantic technologies at work
Semantic technologies at work
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
download
downloaddownload
download
 

Semelhante a Lookingforwardenglish

SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalgowthamnaidu0986
 
List of Journal after read the abstract.docx
List of Journal after read the abstract.docxList of Journal after read the abstract.docx
List of Journal after read the abstract.docxAdieYadie1
 
Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...
Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...
Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...dralainperez
 
Project Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An IntroductionProject Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An IntroductionTeresa
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...ijcseit
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...ijcseit
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...dannyijwest
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionAM Publications
 
3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docx3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docxtamicawaysmith
 
Software Engineering Ontology
Software Engineering OntologySoftware Engineering Ontology
Software Engineering OntologyNidhi Baranwal
 
Development of an intelligent information resource model based on modern na...
  Development of an intelligent information resource model based on modern na...  Development of an intelligent information resource model based on modern na...
Development of an intelligent information resource model based on modern na...IJECEIAES
 
Implementation of a Knowledge Management Methodology based on Ontologies :Cas...
Implementation of a Knowledge Management Methodology based on Ontologies :Cas...Implementation of a Knowledge Management Methodology based on Ontologies :Cas...
Implementation of a Knowledge Management Methodology based on Ontologies :Cas...rahulmonikasharma
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
 
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Ralf Klamma
 
Mapping the Intellectual Structure of Contemporary Technology Management Rese...
Mapping the Intellectual Structure of Contemporary Technology Management Rese...Mapping the Intellectual Structure of Contemporary Technology Management Rese...
Mapping the Intellectual Structure of Contemporary Technology Management Rese...Che-Wei Lee
 
Ongoing activities october 15th 2010
Ongoing activities october 15th 2010Ongoing activities october 15th 2010
Ongoing activities october 15th 2010ictseserv
 
Open-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, SpainOpen-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, SpainAnna De Liddo
 
2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvana2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvanaMarc Smith
 

Semelhante a Lookingforwardenglish (20)

SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
List of Journal after read the abstract.docx
List of Journal after read the abstract.docxList of Journal after read the abstract.docx
List of Journal after read the abstract.docx
 
Thesis slides
Thesis slidesThesis slides
Thesis slides
 
Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...
Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...
Thesis - Alain Perez - Semantic web and semantic technologies to enhance inno...
 
Project Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An IntroductionProject Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An Introduction
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept Extraction
 
3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docx3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docx
 
Software Engineering Ontology
Software Engineering OntologySoftware Engineering Ontology
Software Engineering Ontology
 
Development of an intelligent information resource model based on modern na...
  Development of an intelligent information resource model based on modern na...  Development of an intelligent information resource model based on modern na...
Development of an intelligent information resource model based on modern na...
 
Implementation of a Knowledge Management Methodology based on Ontologies :Cas...
Implementation of a Knowledge Management Methodology based on Ontologies :Cas...Implementation of a Knowledge Management Methodology based on Ontologies :Cas...
Implementation of a Knowledge Management Methodology based on Ontologies :Cas...
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
 
Mapping the Intellectual Structure of Contemporary Technology Management Rese...
Mapping the Intellectual Structure of Contemporary Technology Management Rese...Mapping the Intellectual Structure of Contemporary Technology Management Rese...
Mapping the Intellectual Structure of Contemporary Technology Management Rese...
 
Ongoing activities october 15th 2010
Ongoing activities october 15th 2010Ongoing activities october 15th 2010
Ongoing activities october 15th 2010
 
Open-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, SpainOpen-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, Spain
 
2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvana2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvana
 

Mais de Juan Carlos Olivares Rojas

Analítica de Datos usando Single Board Computers
Analítica de Datos usando Single Board ComputersAnalítica de Datos usando Single Board Computers
Analítica de Datos usando Single Board ComputersJuan Carlos Olivares Rojas
 
Analitica de Datos en Dispositivos de Internet de las Cosas
Analitica de Datos en Dispositivos de Internet de las CosasAnalitica de Datos en Dispositivos de Internet de las Cosas
Analitica de Datos en Dispositivos de Internet de las CosasJuan Carlos Olivares Rojas
 
A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...
A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...
A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...Juan Carlos Olivares Rojas
 
Propuesta de Flexibilidad Curricular en el Tecnológico Nacional de México
Propuesta de Flexibilidad Curricular en el Tecnológico Nacional de MéxicoPropuesta de Flexibilidad Curricular en el Tecnológico Nacional de México
Propuesta de Flexibilidad Curricular en el Tecnológico Nacional de MéxicoJuan Carlos Olivares Rojas
 
Analítica de Datos en Simulador de Redes para Sistemas de Medición Inteligente
Analítica de Datos en Simulador de Redes para Sistemas de Medición InteligenteAnalítica de Datos en Simulador de Redes para Sistemas de Medición Inteligente
Analítica de Datos en Simulador de Redes para Sistemas de Medición InteligenteJuan Carlos Olivares Rojas
 
Propuesta de Mercado Eléctrico Minorista Transactivo en México
Propuesta de Mercado Eléctrico Minorista Transactivo en MéxicoPropuesta de Mercado Eléctrico Minorista Transactivo en México
Propuesta de Mercado Eléctrico Minorista Transactivo en MéxicoJuan Carlos Olivares Rojas
 
Cyber Security on Transactions in Smart Metering Systems usign Blockchain
Cyber Security on Transactions in Smart Metering Systems usign BlockchainCyber Security on Transactions in Smart Metering Systems usign Blockchain
Cyber Security on Transactions in Smart Metering Systems usign BlockchainJuan Carlos Olivares Rojas
 
A Survey on Smart Metering Systems using Blockchain for E-mobility
A Survey on Smart Metering Systems using Blockchain for E-mobilityA Survey on Smart Metering Systems using Blockchain for E-mobility
A Survey on Smart Metering Systems using Blockchain for E-mobilityJuan Carlos Olivares Rojas
 
Detección de Movimiento usando Medidores Inteligentes
Detección de Movimiento usando Medidores Inteligentes Detección de Movimiento usando Medidores Inteligentes
Detección de Movimiento usando Medidores Inteligentes Juan Carlos Olivares Rojas
 
A Survey on Smart Metering Systems using Human-Computer Interaction
A Survey on Smart Metering Systems using Human-Computer InteractionA Survey on Smart Metering Systems using Human-Computer Interaction
A Survey on Smart Metering Systems using Human-Computer InteractionJuan Carlos Olivares Rojas
 
Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...
Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...
Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...Juan Carlos Olivares Rojas
 
Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...
Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...
Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...Juan Carlos Olivares Rojas
 
Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...
Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...
Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...Juan Carlos Olivares Rojas
 
Internet de las Cosas en Redes Eléctricas Inteligentes
Internet de las Cosas en Redes Eléctricas InteligentesInternet de las Cosas en Redes Eléctricas Inteligentes
Internet de las Cosas en Redes Eléctricas InteligentesJuan Carlos Olivares Rojas
 
Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...
Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...
Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...Juan Carlos Olivares Rojas
 
Ciber Seguridad en Redes Eléctricas Inteligentes
Ciber Seguridad en Redes Eléctricas InteligentesCiber Seguridad en Redes Eléctricas Inteligentes
Ciber Seguridad en Redes Eléctricas InteligentesJuan Carlos Olivares Rojas
 
Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...
Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...
Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...Juan Carlos Olivares Rojas
 

Mais de Juan Carlos Olivares Rojas (20)

Ieee itmsb20
Ieee itmsb20Ieee itmsb20
Ieee itmsb20
 
Ropec20neural stick
Ropec20neural stickRopec20neural stick
Ropec20neural stick
 
Analítica de Datos usando Single Board Computers
Analítica de Datos usando Single Board ComputersAnalítica de Datos usando Single Board Computers
Analítica de Datos usando Single Board Computers
 
Analitica de Datos en Dispositivos de Internet de las Cosas
Analitica de Datos en Dispositivos de Internet de las CosasAnalitica de Datos en Dispositivos de Internet de las Cosas
Analitica de Datos en Dispositivos de Internet de las Cosas
 
A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...
A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...
A Comparative Assessment of Cryptography Algorithms for Data Analytic Applica...
 
Propuesta de Flexibilidad Curricular en el Tecnológico Nacional de México
Propuesta de Flexibilidad Curricular en el Tecnológico Nacional de MéxicoPropuesta de Flexibilidad Curricular en el Tecnológico Nacional de México
Propuesta de Flexibilidad Curricular en el Tecnológico Nacional de México
 
Analítica de Datos en Simulador de Redes para Sistemas de Medición Inteligente
Analítica de Datos en Simulador de Redes para Sistemas de Medición InteligenteAnalítica de Datos en Simulador de Redes para Sistemas de Medición Inteligente
Analítica de Datos en Simulador de Redes para Sistemas de Medición Inteligente
 
Propuesta de Mercado Eléctrico Minorista Transactivo en México
Propuesta de Mercado Eléctrico Minorista Transactivo en MéxicoPropuesta de Mercado Eléctrico Minorista Transactivo en México
Propuesta de Mercado Eléctrico Minorista Transactivo en México
 
Cyber Security on Transactions in Smart Metering Systems usign Blockchain
Cyber Security on Transactions in Smart Metering Systems usign BlockchainCyber Security on Transactions in Smart Metering Systems usign Blockchain
Cyber Security on Transactions in Smart Metering Systems usign Blockchain
 
A Survey on Smart Metering Systems using Blockchain for E-mobility
A Survey on Smart Metering Systems using Blockchain for E-mobilityA Survey on Smart Metering Systems using Blockchain for E-mobility
A Survey on Smart Metering Systems using Blockchain for E-mobility
 
Detección de Movimiento usando Medidores Inteligentes
Detección de Movimiento usando Medidores Inteligentes Detección de Movimiento usando Medidores Inteligentes
Detección de Movimiento usando Medidores Inteligentes
 
A Survey on Smart Metering Systems using Human-Computer Interaction
A Survey on Smart Metering Systems using Human-Computer InteractionA Survey on Smart Metering Systems using Human-Computer Interaction
A Survey on Smart Metering Systems using Human-Computer Interaction
 
Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...
Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...
Machine Learnign Model for the Detection of Electricity Energy Fraud Using an...
 
Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...
Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...
Forecasting Electricity Consumption Using Weather Data in Edge-Fog-Cloud Data...
 
Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...
Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...
Aplicacion de Tecnicas de UX en el Desarrollo de un Portal de un Sistema de M...
 
Internet de las Cosas en Redes Eléctricas Inteligentes
Internet de las Cosas en Redes Eléctricas InteligentesInternet de las Cosas en Redes Eléctricas Inteligentes
Internet de las Cosas en Redes Eléctricas Inteligentes
 
Estrategias didacticas
Estrategias didacticasEstrategias didacticas
Estrategias didacticas
 
Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...
Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...
Optimización de la Eficiencia Energética en los Hogares utilizando una Arquit...
 
Ciber Seguridad en Redes Eléctricas Inteligentes
Ciber Seguridad en Redes Eléctricas InteligentesCiber Seguridad en Redes Eléctricas Inteligentes
Ciber Seguridad en Redes Eléctricas Inteligentes
 
Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...
Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...
Estudio de Vulnerabilidad de Protocolos y Redes de Comunicación para Medidore...
 

Lookingforwardenglish

  • 1. cenidet V O L . 1 3S U M M E R 2 0 0 6
  • 2.
  • 3. Editorial Dear Colleagues, The Computer Society with approximately 100,000 members, is the leader organization for the professionals in computer sciences. It was founded in 1946, being the biggest of the societies organized by the IEEE. The Computer Society foments international communication, cooperation and exchange of information. It is dedicated to the promotion of the computer theory, practice and application to the data processing technology. "Looking Forward" is the student written and for The Computer Society students magazine. After a great effort, we finally present the number 13 edition of the "Looking Forward" electronic magazine. Before continuing we want to appreciate the help of the members of the Computer Student Chapter of the CENIDET, who are very honored of being elects to produce this edition. The articles contained in this magazine embrace diverse topics, all related with the Master degree and Doctorate degree thesis projects being developed at the Computer Sciences Department. Particularly, in the following lines of investigation: Software Engineering, Distributed Systems and Artificial Intelligence. We appreciate the cooperation of M.S. Andrea Magadán Salazar for coordinating all their members. This edition has been possible thanks to the help of Prof. Ken Christensen, we acknowledge his support and trust to the Computer Student Chapter of the CENIDET. We know that participating in this kind of projects exalts the prestige of our institution and our country. We hope in a future this magazine can be added to the IEEExplore as part of the literature of the Computer Society. Long life Looking Forward!!! Jonathan Villanueva Tavira Jonathan.villanueva@ieee.org Director Ken Christensen, Associate Profesor, Department of Computer Science and Engineering , University of South Florida Editor in Chief M.C Andrea Magadán Salazar Vicepresidenta del Capítulo de Computación Sección Morelos. Editors Jonathan Villanueva Tavira Rocio Vargas Arroyo Jorge A. Saldaña García Collaborators Edgar Colorado Soto Erika M. Nieto Ariza Francisco Cervantes Álvarez Hilda Solano Lira Jorge A. Saldaña García Juan C. Olivares Rojas Luis E. Morán López Maricela C. Bravo Contreras Michelle Arandine Barrón Vivanco Osslan O. Vergara Villegas Rafael I. Ponce Medellín Rocío Vargas Arroyo Salvador Cervantes Álvarez Vianey G. Cruz Sánchez Student Branch Chapter CENIDET
  • 4. The Center offers the possibility to carry out investigation works in agreement with other institutions like the Institute of Electric Investigations and the Center of Investigation in Energy; besides, there are agreements with important universities and foreign institutes. CENIDET has the departments of Electronic, Mechanics, Mechatronic and Computer Sciences, headquarters of the Computer Student Chapter of the IEEE. The National Center of Research and Technological Development, CENIDET, is located in Cuernavaca, Morelos, city that has been transformed into headquarters of several scientific institutions, each dedicated to research and education, allowing a profitable exchange among them. Cuernavaca benefits from its proximity to the Mexico city, since the researching professors and the students can easily move to participate or to attend events, to exchange information, to consult libraries, to receive consultantships and,in general, to be related with all the activities that propitiate and enlarge the knowledge and the creativity. The Center, naturally, participates of this valuable synergy. The CENIDET offers its postgraduate programs, for related graduate people that are interested in getting ready for the applied investigation and the technological development. Welcome to CENIDET Computer Sciences Department
  • 5. Web Page Retrieval Using an Ontology that is Populated by Automatic Text Classification 21. Summary 1. Editorial. 2.Welcome to CENIDET. Web Page Classification: a Semantic Análisis.7. Digital Image Processing in Wavelet Domain.13. Evaluation of tools for business process in three levels of abstraction. 27. Image Segmentation Script Language. GAP: A Tool to Solve the Problem of the Web Contents Visualization in Pocket PC Devices. 17. 31.
  • 6. Summary Methodology for the generation of 3D models.41. Neuro-Symbolic Hybrid Systems. 35. Vanishing points detection using Thales's theorem.45. Segmentation by color to characterize human gait.55. An Ontology-based Translator for Communicating Negotiation Agents over Internet. 49. 59. Description of two Statistical Models Applied To The Extraction Of Facial Features: Integral projections And Active Shape Model. 63. Authors.
  • 7.
  • 8.
  • 9. 7 Web Page Classification: a Semantic Analysis Rocío Vargas Arroyo, Azucena Montes Rendón Centro Nacional de Investigación y Desarrollo Tecnológico {rvargas04c,amr}@cenidet.edu.mx Abstract In this paper, a semantic analysis for Web page classification is presented. A set of Web pages, resulting from a simple query to a Web browser, is categorized by disambiguating the meaning of the term used for the search. The disambiguation process begins with the isolation of some outstanding paragraphs; linguistic markers are used to accomplish this task. The search term is located within the paragraphs and the Contextual Exploration Method is used to identify words that lead to the discovery of relationships within an Ontology. Finally, the discovered relationships are used for assigning the web page to a category. 1. Introduction Natural Language Processing is a challenging task of Artificial Intelligence, because dealing with language is not a simple task. With the immeasurable growing of the Web, the problem of getting the proper and desired information has grown too. Several research groups have got significant and slightly sufficient results, but not good enough to solve the general needs. These obtained results are sensitive to the ambiguity problem caused, mainly, by the used techniques thus most of the developed projects [1-9] are based on statistical methods and ignore linguistic techniques. In this work our main intention is the creation of a classification tool. This tool will be able to classify a set of HTML web pages written in Spanish. Linguistic markers, Ontology and Contextual Exploration Method [10] are integrated to accomplish the task. 2. Linguistic markers In order to emphasize certain ideas contained in a text, the author uses discourse markers. These markers are linguistic units that set the order of the discourse. Martín Zorraquino and Portolés [11] define these markers as: “Unidades lingüísticas invariables que no ejercen una función sintáctica en el marco de la predicación oracional – son pues, elementos marginales – y poseen un cometido coincidente en el discurso: el de guiar, de acuerdo con sus propiedades morfosintácticas, semánticas y pragmáticas, las inferencias que se realizan en la comunicación” [11]. Prada [12] extracted, from Martín Zorraquino and Portolés [11] extensive analysis of these markers, five categories (see table1). Table1. Discourse markers Category Sub-category Comentadores Ordenadores Marcadores estructuradores de la información Disgresores Aditivos Consecutivos Marcadores conectivos Contraargumentativos Explicativos De rectificación De distanciamiento Reformuladores Recapitulativos De refuerzo argumentativoOperadores argumentativos De concreción De modalidad epistémica De modalidad deóntica Enfocadotes de la alteridad Marcadores conversacionales Metadiscursivos Each type of marker has especial use in Spanish discourse, but for this project, the attention is focused on recapitulative markers because they introduce a recapitulation or conclusion of what was said, they concentrate a general view of the text intention and let the reader, reach the final thought of the author.
  • 10. 8 Examples of these recapitulative markers are: En suma, en conclusión, en definitiva, en fin, al fin y al cabo, resumiendo, finalmente among others. 3. Contextual Exploration Method The Contextual Exploration Method (CEM) [10] was developed by LaLIC team of the Paris-Sorbonne University and directed by Jean Pierre Desclés. This method provides a frame for the identification of semantic information in a text and a set of mechanisms that help in the resolution of semantic indeterminations or ambiguity cases. It also assumes that the texts include linguistic units that help in the task of sense construction. Is required, by the method, the description of indices and indicators. Indicators are linguistic units associated to a set of contextual exploration rules. When an indicator is located within the text a rule is triggered; the context of the indicator is explored in order to identify indices or words that lead to the real meaning of the indicators. Indicators, indices and rules integrate the linguistic knowledge base. 4. Ontology interaction For this project, a Spanish version of EuroWordNet [13] is used. EuroWordNet is a lexical-semantical ontology of many languages such as Spanish, Dutch, Italian, Frech, German and Czech. It follows the WordNet model but integrates some improvements as better expressivity of the knowledge base; adds new relationship definitions and transcategorial relationships. The version used in this project was transformed into a database and accessed through SQL queries. The result of the queries leads to the construction of context exploration rules. Figure 1. Graphical result of a query to the Ontology. Search term capa. 5. Semantic analyzer The process begins with a query to a well known and widely used web browser. Simple search terms are used (simple terms are more sensitive to ambiguity problem) and a set of hyperlinks is retrieved. This set is filtered to get only HTML pages written in Spanish. Once the hyperlinks are extracted, each page is loaded and its content is analyzed to get the most important paragraphs; this paragraph selection is made by locating linguistic markers within the text. The search term is located within the isolated paragraphs the left and right context are evaluated in order to get surrounding indices that lead to the discovery of relationships within an ontology. The extracted relationships allow the association of the HTML web page to a category. The full process is graphically represented in figure 2. Figure 2. HTML web page classification scheme. 5.1. Web browser query and hyperlink extraction The classification process begins with a query to a web browser. The search term is a simple term, a single word. The web browser returns a set of hyperlinks linked to web pages where the search term is contained. Hyperlinks are extracted and its associated page is analyzed later. Several query strings where analyzed in order to simplify the query to the web browser. Language and file format is specified. Example:
  • 11. 9 http://www.google.com.mx/search?num=<number of resulting links>&as_epq=<search term>&lr= <language> &as_filetype=html Two sets of hyperlinks are extracted, the first set corresponds to the direct link to the web page, the second, corresponds to the cache version of the page. See figure 3. Figure 3. Simple interface for hyperlink extraction. 5.2. Localization of linguistic markers and paragraph selection Each web page is loaded and its body content and some other representative information as metadata are extracted. See figure 4. Figure 4. Simple interface for web page content extraction. Some linguistic markers, mostly recapitulative markers, are located within the body of the page. Paragraphs where these markers are present are selected for a more extensive analysis. Consider the next text segment and the search term “capa”: “…El segundo sistema permite una elección mas amplia del copulador y del revelador, pero implica mayores dificultades en el tratamiento. El revelador que tiene el copulador debe actuar en la capa correcta, y esto se logra controlando la velocidad de difusión o aplicando exposiciones controladas. El método de la difusión controlada es anticuado. El método de exposición selectiva utiliza una película de varias capas que tiene la siguiente construcción: sobre la base se aplica la emulsión sensible al rojo y encima una emulsión sensible al verde; sobre esta, una capa de gelatina que contiene un colorante filtro amarillo, y finalmente sobre la capa filtro se aplica una capa de emulsión sensible al azul…” [14] From this text segment, a representative paragraph is extracted: “El método de exposición selectiva utiliza una película de varias capas que tiene la siguiente construcción: sobre la base se aplica la emulsión sensible al rojo y encima una emulsión sensible al verde; sobre esta, una capa de gelatina que contiene un colorante filtro amarillo, y finalmente sobre la capa filtro se aplica una capa de emulsión sensible al azul”. [14] 5.3. Search term localization and context exploration Once the paragraphs are selected, the search term is located within each paragraph an its left and right context are explored looking for key terms that lead to the discovery of a relationship within an ontology. The Contextual Exploration Method [10] is applied and indicators, indices and rules are defined. The indicators sets are composed by the search term and its synonyms, the set of indices is populated by a query to an Ontology and the rules are dynamically created. Example: Indicators = {capa, mano, baño} Indices = {pintura, emulsión, impermeabilizante, esmalte, barniz, …} Rule: If In the context of the I1 set appears any indice belonging to set I2 Then Assing the “Cubrimiento aplicado a una superficie” meaning to the search term in the selected statement End if “El método de exposición selectiva utiliza una película de varias capas que tiene la siguiente construcción: sobre la base se aplica la emulsión sensible al rojo y encima una emulsión sensible al verde; sobre esta, una capa de gelatina que contiene un colorante filtro amarillo, y finalmente sobre la capa filtro se aplica una capa de emulsión sensible al azul” .
  • 12. 10 5.4. Relationships extraction and final categorization For the final categorization, a set of categories must be defined. A first approach for solving this task is to extract a group or relationships, from the ontology, where the search term and the given sense are involved. Example: Figure 5. Homonym relationships of the search term capa. All nodes presented in figure 5 are homonyms of the term capa, each node has a different meaning. Nodes 0 means “Acción y efecto de cubrir la superficie de un cuerpo con un material” and node 3 means “Cubrimiento aplicado a una superficie”. These relationships where discovered in the Ontology, but these are not sufficient for creating a category, so second level relationships must be discovered. Figure 6. Second level relationships of the search term capa. In figure 6, the hyponym relationships (for nodes 0 and 3) of the term capa are shown. Here is where indice terms appear and let us create the categories for the classification. Therefore, a first set of categories names might be formed by first level relationships and its meaning. Here is where a big problem arises, the ontology data is not complete, some of its meanings are missing. 6. Conclusions The easiest way to get information from the web is trough web browsers and directories; however, the obtained results not always are sufficient enough because the search techniques do not care about the semantic content of the pages. So is necessary the implementation of a tool able to make a proper classification of the web pages according the real meaning of the query. With a tool like this, the search process made by the user will be improved cause documents out of the interest might be discriminated and as a consequence, the number of pages to visit and the time inverted in exploring not relevant ones will be diminished. Moreover, the introduction of linguistic techniques to classification methods might improve the way web spiders indexes the information. 7. References [1] A. Gulli and P. Ferragina, “The anatomy of a hierarchical clustering engine for web-page, news and book snippets”, Fourth IEEE International Conference on Data Mining, ICDM’04, Brighton, UK, 2004. [2] A. Gulli, “SnakeT”, http:// www.snaket.com, Pisa University, Italy. [3] Vivísimo, Inc., “How the Vivísimo clustering engine works, http://www.vivisimo.com , 2003. [4] Vivísimo, Inc.,”Clusty”, http://www.clusty.com , 2005. [5] A. Téllez-Valero, M. Montes-y-Gómez and L. Villaseñor- Pineda, “Aplicando la clasificación de texto en la extracción de información”, Encuentro Internacional de Ciencias de la Computación, ENC, Colima, México, September 2004. [6] J. M. Gómez, E. Puertas, G. Carrero, M. de Buenaga, “Categorización de texto sensible al coste para filtrado en Internet”, Procesamiento del Lenguaje Natural, SEPLN, Magazine nº 31, September 2003. [7] L. Golub, Automated subject classification of textual Web pages, for browsing, Thesis for the degree of Licenciate in Philosophy, Lund University, Switzerland, August 2005. [8] F. Sebastiani, “Machine learning in automated text categorization”, ACM computing surveys, 2002. [9] F. Sebastiani, “Text categorization”, in Zanasi A., Text Mining and its Applications, WIT Press, Southampton, UK, 2005.
  • 13. 11 [10] J. P. Desclés, E. Cartier, A. Jackiewicz and J. L. Minel, “Textual Processing and Contextual Exploration Method”, Context 97, Rio de Janeiro, February 1997. [11] Martín Zorraquino, Mª Antonia y José Portolés Lázaro. Los marcadores del discurso. Bosque y Demonte. Vol. 3. 4051-4213. [12] J. Prada, G. Moncecchi, “Reconocimiento eficiente de marcadores del discurso en español”, VIII Simposio Internacional de Comunicación Social, Santiago de Cuba, Cuba, January 2003. [13] Amsterdam University, “EuroWordNet: Building a multilingual database with wordnets for several European languages”, March 2006, http://www.illc.uva.nl/EuroWordNet/ [14] “Fotografía / Fotografía en colores”, March 2006, http://www.punksunidos.com.ar/punksunidas/fotografia/foto grafia6.html
  • 14.
  • 15. 13 Digital Image Processing in Wavelet Domain Osslan Osiris Vergara Villegas and Raúl Pinto Elías Centro Nacional de Investigación y DesarrolloTecnológico (cenidet) Interior Internado Palmira S/N, Col. Palmira. C.P. 62490. Cuernavaca Morelos México. {osslan, rpinto}@cenidet.edu.mx Abstract In this paper we present some explanation about digital image processing in the wavelet domain. First an image is transformed using a Discrete Wavelet Transform (DWT), then several mathematical operations are applied in order to observe some features presented in the image. The transformation can reveal some features that are not clear or difficult to detect in the original domain. We obtain wavelet directionality and edge detection, image smoothing, scale changing, image denoising and compression, and finally, feature extraction in wavelet domain. 1. Introduction Wavelet transform was used in geophysics in early 1980s for the analysis of seismic signals [1]. A wavelet transform decomposes a signal f into its components on different scales or frequency bands. This is made by a convolution process on f using the translated and dilated wavelet ψ (wavelet mother). Based on the selection of ψ, the transformed function allows, for example, the extraction of the discontinuities or edges in f, performing a pattern recognition task or storing a compressed version of f. Wavelets are signals which are local in time and generally have an irregular shape. A signal can be decomposed into many shifted and scaled representations of the original mother wavelet [2]. Wavelets have the advantage of being able to separate the fine details in a signal, very small wavelets can be used to isolate very fine details in a signal, while very large wavelets can identify coarse details. In addition, there are many different wavelets to choose from. One particular wavelet may generate a more sparse representation of a signal than another, so different kinds of wavelets must be examined to see which is most suited for the application you need in digital image processing for example image compression or denoising, pattern recognition, etc. 2. Multiresolution Analysis Multiresoltion analysis is concerned with the study of signals or processes represented at different resolutions and developing an efficient mechanism to change from to one resolution to another [3]. The discrete Wavelet Transform (DWT) is a mathematical tool for the analysis and synthesis of signals that can be used when digital images need to be viewed or processed at multiple resolutions. The localization of signal characteristics in spatial (or time) and frequency domains can be accomplished very efficiently using wavelets. This allows us to simultaneously determine sharp transitions in the spectrum of the signal and in the position (or time) of their occurrence. The principle behind the use of wavelets for decomposing an image is to use a wavelet function Ψ to represent the higher frequencies corresponding to the detailed parts of the image, and a scaling function Φ to represent the lower frequencies corresponding to smooth parts of the image. Figure 1 shows the process to decompose an image using filters to obtain the subband decomposition. Figure 1. Subband decomposition of an image.
  • 16. 14 Multiresolution analysis plays an important role in the perception and decision mechanism of human beings. 3. Wavelet Directionality and Edge Detection With the wavelet transform we can obtain some information (details) about the directionality and the edges presented in an image. In this section we use the image shown in figure 2 to explain how to obtain directionality and edge detection. Figure 2. Image “Star” for edge and directionality analysis. The first thing to do is to transform the original image using some DWT, for this example we decompose the image only one level using the symlet 4. Figure 3 shows the resulting image from wavelet transform. Figure 3. “Star” after wavelet decomposition. From figure 3 we can see that the horizontal edge of the original image is presented in the horizontal detail coefficient matrix of the upper-right quadrant. Similarly the vertical edge is shown in the vertical detail coefficients of the lower-left quadrant. Finally you can see that the diagonal borders are shown in the diagonal detail coefficients of the lower-right quadrant. From the wavelet subband image we can combine the edge information into a single image. The only thing that we need to do is changing to zero all the approximation coefficients (upper-left quadrant), and then compute de Inverse Discrete Wavelet Transform (IDWT). The resulting edge image is shown in figure 4. Figure 4. “Star” edge resulting image. We can use a similar procedure to isolate the vertical or horizontal edges. It is important to remark that the information of diagonal edges is always preserved although we cut the diagonal coefficients. 4. Image Smoothing or Blurring Wavelets can be used as a tool for blurring or smoothing images. For this example we use the image shown in figure 5. Figure 5. “Tools” image. In order to smoothing, we need to compute the wavelet transform in more than one decomposition level; for example we use the Symlet 4 with four decomposition levels as shown in figure 6. Figure 6. Four decomposition level of “Tools”. Converting to zero one detail level allows us to obtain increasingly smoothed versions of the original image.
  • 17. 15 For example if you zeroed the first three details level, we can obtain an image as the one shown in figure 7a). If we change to zero all levels (four) we can obtain an increase of blurring in the image as shown in figure 7b. Figure 7. “Tools” images. a) Result of zeroing first three levels, b) result of zeroing all levels. 5. Change the Scale Changing the scale of an image implies a mathematical operation called interpolation. With the DWT we can downscale an image with a factor of 2n or upscale an image with a factor of 2n . This process is made by removing or adding subbands in the wavelet domain. This scale changing provides an application to progressive reconstruction of the image. Lets suppose that we need to send an image to two users in different sites and those users need different resolution images. The image can be reconstructed with gradually higher resolution approximations of the final high resolution image, and we can send the adequate image for each user at an exact subband reconstruction. Figure 8 shows an upscaling example of the camman image. Figure 8. Upscaled Camman image. a) Original image, b) Incise a doubled image, c) Incise b doubled image. Performing of the upscaling or downscaling process is better than the same process made by interpolation. 6. Image Denoising Image denoising is the process of separating the noise out of the image components from a single observation of a degraded image. The image can be corrupted with noise because of either data acquisition process or naturally occurring phenomena. The simplest technique for denoising is wavelet thresholding (shrinkage). We use as input a noise signal like the image that we shown in figure 9a. We perform the wavelet transform using for example four multiresolution levels, and then we use a denoising method called soft thresholding through all subbands. The technique sets coefficients with values under the threshold (T) to 0, then substracts T from the non- zero coefficients. After soft thresholding, we compute the inverse wavelet transform. Figure 9b shows the image obtained from the denoising process. Figure 9. “Goldhill” image. a) Gaussian noise image, b) denoised image The main problem in denoising is the selection of the best value of T for thresholding. 7. Image Compression One of the main popular applications of wavelets is image compression. Data compression goal is to reduce the volume of necessary data to represent a certain information amount. One of the advantages obtained with the use of DWT is that the image is decorrelated, thus the image can be easily compressed. Some of the wavelet coefficients obtained from DWT correspond to details in the data set. If there are few details, they might be omitted without substantially affecting the main features of the data set. The first step is to transform the original image into the wavelet domain using the DWT, one of the important decisions is what family of wavelet to use and what multiresolution level to apply. The selection of a wavelet family depends a lot on the subsequent use of the image, but it is necessary to take into account some wavelet properties as orthogonality, number of
  • 18. 16 vanishing moments, compact support, symmetry, etc. By the other hand, the multiresoltion level can not be larger than Log2(N). The quantization stage is made after the process of DWT; here, we can use two different strategies. One is to set all high frequency sub-band coefficients that are under a particular threshold to zero. The other is to change to zero, for example, the coefficients behind the matrix diagonal or some decomposition level. There are two highly used techniques for quantization: the Embedded zerotree wavelet coder (EZW) and the Set Partitioning in Hierachical trees (SPIHT) which are very efficient for several applications. The final stage corresponds to the entropy coder which is a lossless stage. Figure 10 shows the lena image and from left to right different images obtained from compression process at different quality and storage space. Figure 10. “Lena” image at different quality and storage space. 8. Image Feature Extraction Image classification is maybe the most important application when using digital images. In order to perform it, a feature vector is used to describe an image. The statistical properties of the wavelet coefficient characterize an image, which can be used to lead us to the better image classification. Some measures obtained from wavelet coefficients are: Norm-2 energy: ∑= = N k kC N E 1 2 21 1 (1) Norm-1 energy: ∑= = N k kC N E 1 12 1 (2) Standard deviation: ∑= −= N k kC N E 1 2 23 )( 1 µ (3) Average residual: ∑= −= N k kCE 1 2 4 )( µ (4) Entropy: ∑= −= N k kk CC N E 1 22 25 log 1 (5) Where ∑= = N k kC N 1 2 1 µ (6) µ is the mean and N the size of the image. 9. Conclusions In this paper we show some applications of the wavelet transform for digital image processing, with the goal of demonstrating that an image can be manipulated even in the wavelet domain. Research in wavelets keeps looking for some more complex families fitting with a particular application, for example, trying to describe an important feature of the image known as image geometry. 10. References [1] Morlet, J., G. Arens, E. Fourgeau, and D. Giard, “Wave propogation and sampling theory part 1: Complex signal and scattering in multilayered media”, Geophysics, Vol. 47, No. 2, pp. 203 -221, February 1982. [2] Maryhelen S., “Image compression using wavelets”, Thesis proposal, Department of electrical and computer engineering, University of New Brunswick, Canada, 1997. [3] Mallat, S., “A theory for multiresolution signal decomposition: The wavelet representation”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 11, No. 7, pp. 674 – 693, July 1989. [4] Gonzalez Rafael C., Woods Richard E. and Eddins Steven L., Digital image processing using Matlab, Pearson Prentice Hall, 2004.
  • 19. 17 GAP: A Tool to Solve the Problem of the Web Contents Visualization in Pocket PC Devices. J. Carlos Olivares R., J. Gabriel González S., Azucena Montes R., Víctor J. Sosa S. e I. Rafael Ponce M. Centro Nacional de Investigación y Desarrollo Tecnológico(cenidet) Cuernavaca, Morelos, México {jcolivares04c, gabriel, amr, vjsosa, rafaxzero4c}@cenidet.edu.mx Abstract This tool intends to fill the existing ‘GAP’ in the Web sites visualization in mobile devices, such as Pocket PC. In order to guarantee that the users can correctly visualize the Web resources, two things are needed: a mechanism for controlling disconnections, and allowing visualization of Web content despite of the device connection state (hoarding), and a mechanism that can adapt the Web content to the specific mobile device features (transcoding). GAP is a tool that integrates these two mechanisms and allows improving of the user’s navigation experience in the Mobile Web. Keywords: Pocket PC, Visualization, Web Resources, Hoarding, Transcoding. 1. Introduction Mobile devices are each time closer in time, according with [1]: "By 2009, more than a half of the microprocessors made in the world will be intended for mobile devices." "The software that will really make mobile devices useful isn’t developed yet." These statistics reflect that the use of mobile devices is increasing due to their tiny size and that its power of processing and versatility is growing day by day. The problem of Web resources visualization in mobile devices is the fact that the great majority of Web sites in Internet have not been designed for this type of devices. The mobile devices have limited resources like small screens, little memory, low processing speeds, etc; in comparison with traditional computers equipment. On other hand, the Web and the protocol that manages it: HTTP are connection oriented (they are based on TCP) what causes the transaction to fail if the user, by any reason, becomes disconnected from the network . In this case, it might not be possible to visualize the Web resources in the mobile client. Disconnections are frequent in this type of devices, mainly because of their main advantage: mobility. In this work a system which development is in progress is described. It focuses in attacking the problem of Web resources visualization on mobile devices. The main characteristic of this work is that great part of the system is executed in this kind of devices, in comparison to the great majority of the existing solutions that are executed in traditional platforms. 2. Alternatives of solution In order to solve this problem several alternatives are presented: to design a new protocol, to modify and existed protocol or to implement intermediary services that solve the problem. 2.1 New protocols In this scheme is possible to mention the WAP protocol and the WML language, they work in an analogous way as HTTP-HTML in the traditional Web. The problem strives in that WAP only works with mobile equipment and this would bring the same fragmentation that today has the Web (special pages for all class of devices). In addition, WAP was originally designed for devices with limited resources capacities (monochrome screens, lower bandwidth, etc) which is actually solving day by bay through bandwidth wireless connection (WCDMA, UTMS, 802.11g, WiMax, etc) and with more and more powerful equipment. The best solution would be to create a new protocol. The problem is that this one must be totally compatible with the existing ones, because if not, it would let unusable thousands of existing resources (it would be necessary to modify as much Web servers as Web clients). 2.2 Modification of protocols Within this alternative exits the case of having a new request scheme of Web resources. This new
  • 20. 18 scheme receives the name of Push, whereas traditional scheme receives the name of Pull [2]. The Pull scheme receives the name of “over demand’. Under this scheme, the client (user) is who visualizes a resource in an explicit way. In our case, if a user wants to see the page of cenidet, must write in the Web browser the next URL: http://www.cenidet.edu.mx/. The Push scheme also receives the name of 'subscription-notification'. In this scheme, the user subscribes itself to a service and when some event of interest happens a notification is sent for alerting the user about the event. Generally these two schemes do not live on isolated way. Hybrid schemes (Pull&Push) have been applied in diverse existing services, so is the case of the reception of SMS/MMS messages, where the send of messages is Pull and the reception is Push, since it notifies to users about the existence of new messages. Another service that has made famous devices like the Blackberry to become successful is the Push-mail [3]. This service comes to solve the problem of email visualization in mobile environments. Under the traditional scheme of the electronic mail, for consulting the email, a user must be connected all the time to receive it. This originates great costs if the network connection generates costs per time. With this new scheme, the user is not connected to the mail server. When a new mail in the server is received, it notifies the client of the existence of the new mail and sends it to the mobile client. For this type of schemes, protocols like HTTPU (HTTP over UDP) or HTTPMU (HTTP over multicast UDP) have been proposed, and basically works similar to the HTTP but using datagrams, which are not in an oriented connection way. With these protocols are possible to offer a better quality in the mobile Web [4]. 2.3 Intermediary services This is the more extended solution to solve the problem of Web resources visualization and many other problems present on Web, like the case of firewalls that solve some of the Web security problems like the access control, or proxies’ caches that tries to reduce the access latency to the information. The scheme of intermediaries is widely used because it doesn’t need to modify neither the clients nor the servers; in fact, the client and server processes do not notice the existence of these intermediary services. These services are in charge of the hard work and are transparent to the users. The tool that is described in this article, works under the scheme of intermediary services. 3. Proposal of solution The hoarding process solves the problem of Web resources visualization without concerning the state of the connection of the mobile device. For this, it becomes necessary that the user has already stored, in local way, in his device the resources that he o she will use. As can be observed, the amount of resources to occupy can be immense, whereas the capacity of storage of the devices is limited. In order to give solution against this new problem is necessary to have an effective way to know the resources that a user could use. With hoarding is possible to reduce this, through algorithms of association rules applied on Web logs, is determined the optimal set of resources that will be replicated to the mobile clients [5]. A mechanism which tries to solve the adaptation problem of Web resources to the displaying capacities on mobile devices is transcoding. It consists of transformation of resources, distilling and processing of all those characteristics that are not available in the device is needed. The used mechanism of transcoding uses HTML to a subgroup of HTML transformer, using XML. The system is based on client-server architecture with an intermediate tier on the server side as on the client side. The system is shown in Figure 1. Figure 1. General architecture proposed. The general system has been denominated GASWT (Gestor de Acaparamiento de Sitios Web Transcodificados: Hoarding Manager of Transcoding Web Sites). The intermediary in the client side is denominated GAP (Gestor de Acaparamiento para Pocket PC: Hoarding Manager for Pocket PC), whereas the server side is denominated GAT (Gestor de Acaparamiento y Transcodificación, Hoarding Manager and Transcoding). The GAT is composed by MA (Mecanismo Acaparador: Hoarding Mechanism) and by MT (Mecanismo Transformador: Transcoding
  • 21. 19 Mechanism). The communication between the processes is made through a HTTP request-response scheme. As much the MA as TM are taken from other projects that together with this one, comprise the Moviware project [6], whose main function is to offer a set of services to mobile clients that have frequent disconnections. The general operation of the system is described in the next lines. The user introduces an URL from the Web browser (which has been previously configured to redirect his exit towards the GAP). The GAP receives the request and determines if it is in the local cache of the device, if found, the hoarded resource is sends to the Web browser. When the resource is not hoarded, the system validates the connection existence in order to obtain the resource on line. If for some reason the resource cannot be shown, (because it doesn’t exist or has detected an error in the connection) the system notifies the user by sending an error message. On the other hand, if the Web resource is not hoarded and a pattern of the site in the local device doesn’t exist, the MA sends the Web resources if a pattern for this site exists. If the pattern exists but the hoarded resources in the MA aren’t present, it obtains them by requesting them to MT and soon compresses the resources in zip format to optimize the process. Once the MA has sent the hoarded Web site, the mobile device must decompress the Web site and update its list of patterns. This process happens in transparent way, in a way that the user never notices. MT is responsible of collecting documents and if they are HTML, it transforms them if the configuration parameters indicate that. The transcoding is made on line, because the process is slowed down if the document is too large. The actions that the user can make on the system consist in visualizing Web sites on line, visualizing Web sites on disconnection mode, visualizing error messages, visualization of the requests states and finally, set up the system. The GAP is basically conformed of three main modules which are: Observer, GAL (Gestor de Acaparamiento Local: Local Hoarding Manager) and GDL (Gestor de Desconexión Local: Manager of Local Disconnection). The Observer is responsible of processing each request and to give back the result to the navigator. The GAL is responsible of the manipulation and control of the cache in the device. The users decide which resources are susceptible of hoarding, as well as limiting the storage space. The GDL is responsible of determining the state of the connection. The control of the disconnections has been used drilling the network during three seconds. Observing the quality of the results, a threshold of 30% of accepted connections determines if the client is connected (if the threshold is surpassed or equaled) or is on disconnection mode (if it is below the threshold) [7]. For the implementation of this tool, we used .NET Compact Framework 1.0 with C # language, because it is the best option to program in Pocket PC platform [8]. The modifications of the MA and MT are being made in Java so that it is language in which these modules are programmed. 4. Results The tool described in the present document has been proven in diverse equipment like Pocket PC 2000 (Compaq iPAQ H3630), Pocket PC 2002 (HP Jornada 5500), Pocket PC 2003 (HP rx3115), emulators of Windows CE, desktop PC (Compaq Presario with Pentium 4 1.4 Ghz. processor, 512 Mb of RAM memory). The first test scenario consisted of acceding to the Web resources in on line mode. We obtained satisfactory results (see Figure 2). In the number two test scenario, the GAP was executed without being connected to the network. Additionally we had a pattern of a hoarded Web site (http://www.cenidet.edu.mx/) and resources. In this case not existing images in the original site were used, because it was possible to verify that the hoarded resources are correctly displayed. The number three test scenario (see Figure 3), demonstrates that it is possible to transcoding the resources in the device as well as showing them in a local way if they are hoarded and without transcoding. It is Also possible to execute the GAP in other platforms like Smartphones (SmartGAP) and a desktop PC (WinGAP). GAP, WinGAP and SmartGAP are the same program but with different name, to differentiate the platforms in which they’re running. 5 Conclusions With the presented tool is being demonstrated that it is possible to execute complex services in Pocket PC devices, so is the case of an intermediary service that it allows to visualize Web resources when it exists or not a network connection. At this time we have verified in an isolated way most of the functions of the system (it lacks the methods of decompression of the hoarded site), it
  • 22. 20 would be necessary the respective integration of components and testing to the system in its totality. Figure 2. Case of test 1: Visualization of Web resources with network connection. Figure 3. Visualization of Web sites in disconnection mode with hoarded Web resources and without transcoding. Figure 4. Case of test 3: Visualization of Web sites in connection mode, with hoarded and transcoding resources. The expected benefits at the conclusion of this investigation work are: 1) Visualization of Web sites without mattering if the devices are connected or not. 2) Reduction of latency in the access to the information, if the resource is hoarded locally. 3) Energy Saving by the fact to work in disconnection mode. 4) Saving money if the user decides not to connect to a network that receives the service and generates expenses by the access time. 5) Facility of administration of Web sites when not having different versions to each device. 6. Acknowledgments We want to give thanks to Rocío Vargas Arroyo for her contribution in correct this paper. 7. References [1] SG magazine, http://www.softwareguru.com.mx [visited march 2006] [2] Purushottam Kuikarni, et al., “Handling Client Mobility and Intermittent Connectivity in Mobile Web Accesses”, Department of Computer Science, University of Massachussets. [3] Blackberry’s push technology, http://www.blackberry.com/products/software/integrations/p ush_email.shtml [visited march 2006]. [4] UPnP Forum, http://www.upnp.org/, [visited march 2006] [5] David Valenzuela, “Mecanismos para predicción de acaparamiento de datos en sistemas clientes/servidor móviles”, masther thesis, cenidet, august 2002. [6] Gabriel González. “Plataforma middleware reflexiva para aplicaciones de cómputo móvil en Internet (Movirware)”, cenidet. [7] J. Carlos Olivares, et al, “Control de desconexiones en la visualización de páginas Web en dispositivos móviles Windows CE”, for appear in XVI CIECE’06, april 5,6 and 7 2006, Cd. Obregón, Sonora, México. [8] Gabriel González, Azucena Montes, J. Carlos Olivares, “Comparativa y evaluación de las herramientas de programación para desarrollar aplicaciones en plataforma Pocket PC”. VI CICC’05, Colima, Colima, México, september 2005.
  • 23. 21 Evaluation of tools for business process in three levels of abstraction Erika M. Nieto Ariza1 , Javier Ortiz Hernández1 , Guillermo Rodríguez Ortiz2 1 Centro Nacional de Investigación y Desarrollo Tecnológico Interior internado Palmira s/n, Cuernavaca, Morelos, 62490 México {erika, ortiz}@cenidet.edu.mx, 2 Instituto de Investigaciones Eléctricas Reforma 113. Palmira, Cuernavaca, Morelos, 62490 México gro@iie.org.mx Abstract Organizations are increasingly choosing the use of the web to provide their services to their clients. Services are the systemization of the business processes in the organization. Due to the great number of existing modeling methods and the increasing use of internet, it is necessary to identify the information that modeling methods allow to specify. In this paper, a set of concepts is proposed to evaluate modeling methods for business modeling using three levels of abstraction –organizational, integration and web. 1. Introduction Organizations should decide how the technology systems support business and how increasingly these information systems become an integral part of the business processes [1, 2]. Models are commonly used to flexibly represent complex systems and to observe the performance of a business process when a technology system is integrated [3, 4, 5]. A business model is an abstraction of how a business performs, it provides a simplified view of the business structure which acts as the basis for communication, improvement, or innovation, and defines the information systems requirements that are necessary to support the business. A model has to capture the domain without reference to a particular system implementation or technology. One of the problems with modeling the early representations of business processes, conceptual views of information systems and Web interactions is the great number of techniques to model and specify these models, and, additionally, since each one has its own elements, this makes it complex and laborious to compare and select the appropriate technique to model a system in an specific level of representation. Three modeling levels of abstraction are proposed which integrate a set of concepts to build early web application models: a) Organizational, it describes how the organization works and the business process that is going to be systematized with a web information system; b) Integration, it describes the role of the software system and its integration with a particular organizational environment; c) Web, it describes the semantics of a web application [5,6]. The basis of our contribution is in the identification and classification of a set of concepts which are used to know what to model at each level of abstraction and, to have a modeling method evaluation framework to distinguish the capabilities of each method in order to model at the three levels of abstraction. There are some methods and methodologies to evaluate business process modeling; however, they do not evaluate capabilities but rather the functionality of the application or the modeling methods. Rosemman proposes ontology to evaluate organizational modeling grammars identifying their strength and weaknesses [7]. Luis Olsina [8] and Devanshu Dhyani [9], propose a methodology to evaluate the characteristics of a web application in operational phases. The structure of this paper is as follows: in section 2 the modeling concepts that comprise our approach are briefly presented, in section 3 the modeling concepts are enhanced with a set of aspects found to be useful in building models and a method evaluation methodology is presented, in section 4 the results of the evaluation are shown, in section 5 the conclusions about the benefits of the methodology are discussed, finally the references are presented. 2. Modeling concepts A business process model can be viewed at many levels of abstraction, and complementary model views can be combined to give a more intelligible, accurate
  • 24. 22 view of a system to develop than a single model alone [3]. This approach establishes three levels of abstraction and each one includes certain modeling concepts of features as shown in table 1. At each of these levels, concepts are properties or characteristics that structurally describe types of requirements in a specific level of abstraction; they define the key elements in a business process. Concepts in each level of abstraction were selected based on the analysis of several techniques and methods for business process modeling at the three levels. Table 1: Modeling concepts at each level of abstraction Organizationa l level Integration level Web level Business process Pure navigation --- Navigation page - Relationship User profile (Rol) User profile (Rol) Actor Actor Class (object) --- Resource Artifact Artifact Artifact Goal Goal --- Goal Task Function Service Service Activity Event Event --- Business rule Constraint Precondition and postcondition --- Quality No functional requirement No functional requirement --- The organizational modeling concepts are as follows. - Actor. It describes an entity that has a specific goal in the business process. - Resource. It describes an informational or physical entity that is transferred between actors. - Goal. It describes a business process desired state that an organization imposes to itself. - Task. It describes a series of activities oriented to reach a goal. - Activity. It describes a set of actions to carry out one task. - Quality. It describes the desired characteristics in the business process. - Business rule. It describes the actions and criteria that govern the execution of the business process. The integration modeling concepts are as follows. - Actor. It describes an entity that interacts with the information system and that might play different roles. - Artifact. It describes an entity that is transferred between an actor and the information system. - Goal. It describes the information system purpose, limitations and responsibilities. - Function. It describes a service that must be provided by the information system. - Event. It describes a change in the business process in one specific moment of time. - Constraint. It describes a condition for a service execution supplied by the information system. - Non functional. It describes the desired quality features or constraints for the information system. The Web modeling concepts are as follows. - Navigation relationship. It describes a global vision of the Web application according to a user profile. - User profile. It describes the user unique use of the Web application. - Class. It describes an object type to model the entities that integrate the application. - Artifact. It describes an abstract object to be transferred between the Web application and a user. - Goal. It describes the purpose of the Web application. - Service. It describes an activity or an action that the web application has. - Event. It describes the trigger of an activity or action that might be carried out to obtain a result or artifact. - Pre and pos condition. It describes the performance of an event execution. - Non functional requirement. It describes the desired quality features or constraints for the Web application. Each concept used for business process modeling is related to each other. 3. The concepts and the evaluation of methods approach The last section introduced a set of modeling concepts used to model business processes and systems at different levels of abstraction. Here the concepts are enhanced with aspects that make them more powerful to model a particular view. These aspects are also used as scales to evaluate modeling methods. These aspects are capabilities sorted by the concepts presented before and a scale is defined for each concept using the capabilities related to the concept. Also, a desired capability mentioned in the literature may be used in the definition of a scale. Following a well-known approach from the economics and management disciplines, to each aspect a scale between 0 and 5 is assigned which is going to be used to evaluate one of the modeling capabilities. As in the statistics methods, the concepts in this paper are qualitative variable with a nominal scale [10]. The evaluation scale is obtained by first taking a list of the capabilities of one method, and then a list of capabilities from a second method, from a third, until all selected methods are analyzed. The concepts evaluation scales facilitate the comparison of different modeling methods capabilities (see Tables 2, 3 and 4). The order assigned to the scales is intuitive and relatively arbitrary; however, it can be changed easily.
  • 25. 23 Then each one information method is evaluated for all the aspects in each level of abstraction. Table 2: Aspects and evaluation scales for the organizational level of abstraction Table 3: Aspects and evaluation scales for the integration level of abstraction The evaluation consists in assign a value to each concept of the method. For example, the concept non functional requirement at the web level; if the method has the non functional requirement concept; the method should have 1 point. If the method in the non functional requirement concept says who proposes it and to what is applied, the method should have 2. If the method has the concept of non functional requirement, who proposes it and to what is applied, and also, the kind of requirement, the method should have 3 points. If the method has the concept of non functional requirement, who proposes it and to what is applied, the type of the requirement, and also, the measure to verify compliance; the method should have 4 points. The method should have 5 points if it has the concept of non functional requirement, who proposes it, to what is applied, the type of the requirement, the measure to verify compliance and what happens if it is not fulfilled. Table 4: Aspects and evaluation scales for the Web level of abstraction 3.1. Evaluation methods The evaluators have to evaluate the three levels of abstraction for all concepts. For each modeling method and for each aspect ai, a corresponding evaluation ei is obtained. The results are displayed in a table for easy comparison and a total score is obtained for each method and for each level of abstraction as Σei. A method that scores better than other, possibly has more capabilities to model requirements at the corresponding level of abstraction than the first. 4. Results of the methods evaluations As an exercise, the following methods i*, Tropos, EKD, BPM-UML, OO-Method/OOWS, OOWS [5, 7, 4, 8, 9, 11, and 12] were evaluated using the scales presented (tables 5, 6 and 7). The methods evaluated at each level are not the same since some methods do not offer the modeling concepts for the level where they are not shown. Table 5: Organizational level evaluation of the methods Organizational level Max. Value I* Tropos EKD BPM- UML Actor 5 5 5 5 5 Resource 5 5 5 2 5 Goal 5 1 3 4 3 Task 5 2 4 3 2 Activity 5 0 2 0 4 Business rule 5 2 0 5 4 Quality 5 3 4 4 4 Total 35 18 23 23 27 Scale Concept 1 2 3 4 5 Actor Actor --- Role Type Responsibilit y Resource Resource Type Actor using it --- Actor supplying it Goal Goal Priority Problem Opportunit y Verification Task Task Who requests Who executes Hierarchy Associated Goal. Activity Activity Tasks supported Hierarchy How is activated When is concluded Business rule Business rule Associated concept Origin Type Hierarchy Quality Quality Associated concept --- Origin Measure Scale Concept 1 2 3 4 5 Actor Actor --- Role Type Responsibility Artifact Artifact Actor or function supplying --- Actor or function requiring Artifact state Goal Goal Who establish it, Associated to a function Assigned priority Measure, Failure cause Opportunity to solve a problem Function Function Who starts it Who uses it Hierarchy The product Event Event Who fires it, What is the start state, What is produced, Hierarchy Who receives the product, Owner function Final state Constraint Constraint Type Who defines it To who or what applies Who or what enforces it Non functional requirement Constraint Who proposes it, To what is applied. Type of requirement. Measure to verify compliance. What happens if not fulfilled. Scale Concept 1 2 3 4 5 Navigation page - Relationship Navigation page Nav. page - Relationship User Profile Navigation help Access constraints User profile (Role) User profile Role Role changes allowed Services per user Business process state Class (object) Class (object) Attributes Relationships Methods Type of relationships Artifact Artifact --- Type Supplier User Goal Who defines it Associated service, Priority Measure Failure cause, Opportunity to solve it Service Related events Hierarchy, Requesting User Executing agent, Result. Result final user Owner page Event Event Service owner, Hierarchy, Implementin g class Who requests Shared or not Pre and post condition Post condition Pre condition --- --- Associated event Non functional requirement Non functional requirement Who proposes it, To what is applied. Type of requirement. Measure to verify compliance. What happens if not fulfilled.
  • 26. 24 Table 6: Integration level evaluation of the methods Integration level Max. Value I* Tropos EKD BPM- UML OO- Method Actor 5 5 5 5 5 1 Artifact 5 5 5 4 5 4 Goal 5 1 3 4 3 1 Function 5 2 2 5 5 2 Event 5 0 1 0 4 3 Constrain 5 2 0 5 4 5 No functional 5 3 4 4 4 0 Total 35 17 20 27 30 16 Table 7 (a): Web level evaluation of the methods (business process) Nivel web Max. Value Tropos OO-Method / OOWS OOWS User profile 5 3 4 4 Class 5 0 5 5 Artifact 5 4 4 4 Service 5 3 3 3 Event 5 1 3 2 Precondition and post condition 5 2 5 3 No functional 5 3 0 0 Total 35 16 24 21 Table 7 (b): Web level evaluation of the methods (pure navigation) Nivel web Max. Value Tropos OO-Method / OOWS OOWS Navegational page – relationship 5 1 5 5 User profile 5 3 4 4 Goal 5 3 0 0 Artifact 5 4 4 4 Service 5 3 3 3 Total 25 14 16 16 At organizational level, BPM-UML obtains good scores for this level of abstraction, and i* has the lowest score. The methods were evaluated with respect to the parameters defined for the approach presented here. During the evaluation of methods, their own characteristics are shown, for example, the quality aspects of a business process are modeled as qualitative goals using BPM-UML. At integration level, the result shows the capacities of each method, for example, BPM-UML obtains good scores for this level, but OO- Method has the lowest score. 5. Conclusions There are many proposals to model the organizational, integration and web requirements and each one has its own elements. Some use the same concepts but the names are different, which makes it complex and laborious to compare the methods. The approach presented here unifies the various terminologies, increases the knowledge about modeling concepts, and proposes an evaluation approach for the methods modeling capabilities and techniques. This helps to select the method that is more appropriate to the needs of a problem domain. The approach has been used to evaluate e-learning systems [13]. Additionally, it has been applied in the development of various case of studies to evaluate virtual reality methods and to clearly appreciate the concepts that the methods allow to model. 6. References [1] James Pasley,: “How BPEKL and SOA are changing web services development”, IEEE Internet Computing. May – June 2005. [2] Peter F. Green, Michael Rosemann y Marta Indulska,: “Ontological Evaluation of Enterprise systems Interoperability Using ebXML”, IEEE Transactions on Knowledge and Data Engineering, Vol 17, No. 5, IEEE Computer Society, may 2005. [3] Mersevy T. and Fenstermacher K.,: “Transforming software development: and MDA road map”, IEEE Computer Society, September 2005. [4] H. E. Eriksson and M. Penker, Bussiness process modeling with UML, Chichester, UK, Wiley Editorial, 2000. [5] E. Yu,: Modelling Strategic Relation for Process Reengineering, Universidad de Toronto, Canada, 1995. Thesis submitted for the degree of Doctor of Philosophy. [6] A. Ginige and S. Murugesan,: “Web Engineering: An Introduction” IEEE Multimedia, pp 1-5, Jan-Mar 2001. [7] Peter F. Green, Michael Rosemann y Marta Indulska, “Ontological Evaluation of Enterprise systems Interoperability Using ebXML”, IEEE Transactions on Knowledge and Data Engineering, Vol 17, No. 5, IEEE Computer Society, may 2005. [8] Olsina, Luis A., Metodología cuantitativa para la evaluación y comparación de la calidad de sitios web. Tesis doctoral. Facultad de Ciencias Exactas, Universidad Nacional de La Plata, noviembre de 1999. [9] Devanshu Dhyani, Wee Keong Ng, and Sourav S. Bhowmick,: A survey of web metrics, ACM computer survey, Vol 34, No. 4. December 2002, pp. 469-503. [10] William L. Carloson and Betty Thorne, Applied Statistical Methods for business, Economics, and the Social Sciences. Prentice Hall, 1997. [11] Bubenko J., Brash D. y Stirna J.: EKD User Guide, Royal Institute of technology (KTH) and Stockholm University, Stockholm, Sweden, Dept. of Computer and Systems Sciences, 1998. [12] E. Insfrán, O.Pastor and R. Wieringa: “Requirements Engineering-Based conceptual Modelling”, Requirements Engineering Springer-Verlang, vol. 2, pp. 7:61-72, 2002. [13] Eduardo Islas P., Eric Zabre B. y Miguel Pérez R.: “Evaluación de herramientas de software y hardware para el desarrollo de aplicaciones de realidad virtual”, http://www.iie.org.mx/boletin022004/tenden2.pdf (2005).
  • 27.
  • 28.
  • 29. 27 Image Segmentation Script Language Francisco Cervantes Álvarez, Raúl Pinto Elías Centro Nacional de Investigación y Desarrollo Tecnológico (cenidet) Interior Internado Palmira s/n, Cuernavaca, Morelos, México. {cervantes04c, rpinto}@cenidet.edu.mx Abstract In this article we propose the use of a script language to the image segmentation stage in artificial vision. Here the proposed language, the system architecture to interpret scripts and the general structure of the programs that integrate the operator library are describing. Finally, some tests and results of the use of proposed script language are shown. 1. Introduction In this paper we propose the use of a script language for image segmentation. Nowadays, the use of script languages in the graphic programming is increasing, because these allow testing the ideas on an easy way [1]. Also, script languages easily allow the code reuse [2]. However, in the artificial vision area few works focused on digital image processing by script languages exist, an example is shown in [3], where the user make a script with graphic objects, then they execute the script to process a given image. An example of a commercial script language is MATLAB [4]. The proposed script language, allow proving ideas of image segmentation on an easy way and the user do not need to know how the segmentation algorithms makes the process. Also, the language allows the code reuse by the operator library (the operators are independent to each other) and the implementation of a script interpreter. The library above mentioned allows that the language can grow of a modular way and without need to modify the existent code. This paper is structured of the following way. In the second section the basic elements of the proposed script language are described. The third section shows the basic structure that should have the library operators. In the fourth section the general scheme of the script language interpreter is described. Finally, in the fifth section some tests and results are shown. Lastly the conclusions are shown and some future works are commented. 2. Basic elements of script language The language is composed of the definition of the following data types: Entero, Real, Cadena, ImagenGris, Mascara and Contorno. Also the basic arithmetic operations are defined (addition, subtraction, multiplication and division). The language have the following basic structures: • Declaration. • Assignment. • Operator Call. The corresponding syntax to above structures are the following. Declaration: data_type (variable_name) (, variable_name)* Assignment: variable_name = variable_nameX variable_name = arithmetic_expression variable_name = operator_name (arguments) Operator Call: operator_name (arguments) The syntax above mentioned provides a general structure, now all depends on the registered operators in the operator library. This structure is named language core. The script language interpreter is very important because it let recognizes new operators. This way of language definition, where only the structures are established but the elements language are not defined, give the advantage of adding new elements or commands in a dynamic way, without modifying the core code. However, each operator is independent to each other. The single restriction to add an operator to the library is following a basic structure specification. This restriction must be followed in order to let the core and the operators interact.
  • 30. 28 3. Operator basic structure An operator can be used as a part of the library, if the operator has a general structure like the structure shown in figure 1; it can be used as a part of the library. Core libraries for data type manager (numbers, strings, images and templates). Request of arguments (file paths where are the content of the parameters). Operator body (PDI algorithm). Return the result (the result is save in the last argument by a file). Figure 1 Operator general structure In the figure 1, the structure that should have the operators is shown. Some structure elements can be ignored, for example, in the header only those only those required core libraries must be included. An operator can return a value, but this condition is not absolutely necessary, for example, the operator for showing an image only displays the image in the screen and it does not need returning anything to the core. By default is necessary that the operators receive at least one argument. In order to interact the core and operators must be used the defined data types which are in the core. Also, the parameters that an operator need for its execution have to be received by the file path specification. In these files is the content of the operator parameters. Finally, also, is necessary to save the output data in a file which is specified by the last parameter that is received by one operator. Below, an example of an operator to extract the negative of an image is shown. #include "CImagGris.h" AnsiString CharToAnsiString(char *arreglo); int main(int argc, char* argv[]) { if ( argc < 3 ) exit ( ERR_NUM_PARAM ) ; char * a_entrada = argv [ 1 ]; char * a_salida = argv [ 2 ]; AnsiString entrada,salida; CImagGris Imagen; entrada=CharToAnsiString(a_entrada); salida=CharToAnsiString(a_salida); if(Imagen.leerArchivo(entrada)==false) exit(ERR_IMAG_EN); int x,y,h,w; Byte pixel; h=Imagen.Alto(); w=Imagen.Ancho(); for(x=0;x<w;x++) {for(y=0;y<h;y++) { pixel=Imagen.getPixel(x,y); Imagen.setPixel(x,y,255-pixel); } } if(Imagen.escribirArchivo(salida)==false) exit(ERR_NO_MEMO); exit ( BIEN ) ; } Right now the interpreter core can only support images in BMP format of 24 bits. 4. General scheme of script interpreter The general structure has been shown. Now, the general scheme of the script interpreter is presented. In the figure 2, each component of the interpreter and the relationship between operators are shown. Figure 2 Script interpreter general scheme By this structure the system first explore the operator library to generate the structure of each operator (syntax, semantic), later it analyze the input script and execute the operators. 5. Test and results Several tests have been done to show the advantages that provide the script language use on image segmentation. For example, the user can to use this language without a direct interaction with the algorithms. The first test consists of showing how to make a new operator that convert an image to binary image. To Operador library Interpreter Process module Memory module Data types Interpreter core Script Result
  • 31. 29 create an operator is necessary to have a Builder C++ compiler. In the figure 3 the operator code is shown. Figure 3 Operator to convert an image to binary image. Can be saw in the above figure that making a new operator is very easy, only is necessary to follow the general structure that has been specified. Now, only the user has to generate the executable file by the compilation of the source code. In this moment the new operator has been created and must be registered in the library. The second test consists of registering the new operator. For this the Métodos option of Herramientas menu is used. In the figure 4 the menu is shown. Figure 4 Script interpreter interface When the user click in this menu the registration screen is displayed, here the new operator have to be registered, this is shown in the figure 5. Now, the user has to indicate the executable file path and to assign an alias. The alias is used by interpreter language, later the input and output parameters of the operator are specified. Figure 5 Screen to operator register Once the operator is registered; this one can be used as a part of the language, in the figure 6 show the above mentioned process. Here, a script to convert images to binary images is made, and also show the initial and final image. With these tests the advantages of the script language are shown. By the capability of registering new operators provide an open language to increase based on the needs of the user. Nowadays the language has 25 segmentation operators and 5 image description operators. In the test 3 is shown how the operators BinarizarImagen and VerImagenGris interact one to the other. These operators can be saved to be later reused. Finally the manager of the library is simple because the interface provides screens to modify, remove and to add operators to the library. 6. Conclusions With this paper we can conclude that the use of script language to image segmentation is practical. In this work we can saw that the segmentation algorithm functionality can be absent-minded by using scripts, then the user do not need to know the algorithms. Finally, we can say that the script languages might be used in others stages of the artificial vision. The operator library used by the interpreter is built for growing in a modular way. The library grows with each operator that is registered. #include "CBMP24.h" #include "CNumero.h" int main(int argc, char* argv[]) { if ( argc < 4 ) exit ( ERR_NUM_PARAM ) ; char * a_imag = argv [ 1 ]; char * a_umbral = argv [ 2 ]; char * a_salida = argv [ 3 ]; int valorUmbral; AnsiString imag,umbral,salida; CIMAGEN_BMP24 Imagen; CNumeroMemoria UmbralBinario; imag=CharToAnsiString(a_imag); umbral=CharToAnsiString(a_umbral); salida=CharToAnsiString(a_salida); if(Imagen.leerArchivo(imag,GRIS)==false) exit(ERR_IMG_EN); if(UmbralBinario.leerArchivo(umbral)==false) exit(ERR_UMB_EN); valorUmbral=(int)UmbralBinario.Valor(); int x,y,h,w; Byte pixel; h=Imagen.Alto(); w=Imagen.Ancho(); for(x=0;x<w;x++) { for(y=0;y<h;y++) { pixel=Imagen.getPixelGrey(x,y); if(pixel>=(Byte)valorUmbral) Imagen.setPixelGrey(x,y,255); else Imagen.setPixelGrey(x,y,0); } } if(Imagen.escribirArchivo(salida)==false) exit(ERR_NO_MEMO); exit ( BIEN ) ; } Figure 6 Script to convert an image to binary image
  • 32. 30 7. References [1] M. Villar, “Guía de lenguajes de script para prototipazo rápido”, http://www.codepixel.com/tutoriales/prototipado/, 2006. [2] K. Muehler, “Adaptive script based animations for medical education and intervention planning”, Department of Simulation and Graphics, University of Magdeburg, Germany. [3] “Sistema interactivo para la enseñanza de la visión artificial”, Depto. de Sistemas Inteligentes Aplicados, Escuela Universitaria de Informática, Universidad Politécnica de Madrid, 2006. [4] “MathLab”, http://www.mathworks.com/, 2006.
  • 33. 31 Web Page Retrieval Using an Ontology that is Populated by Automatic Text Classification Ismael R. Ponce M., José A. Zárate M., Juan C. Olivares R. Centro Nacional de Investigación y Desarrollo Tecnológico {rafaxzero04c, jazarate, jcolivares04c}@cenidet.edu.mx Abstract In this article is described a proposal to help users in the arduous task that means recovering information from the Web, specially when queries are about a subject or specific approach. For this, we suggested the use of an ontology whose instances are Web pages links about the domain on which the ontology was constructed, taking advantage of the order and categorization that it offers, to guide the user through the concepts that integrates it and find information related to them. The creation of an ontology about a particular domain and the necessary activities to get an automatic classification of the Web pages like instances in the ontology are described. Keywords: Ontology, automatic classification methods, vector space model. 1. Introduction Nowadays in agreement with the technological evolution, the amount of information that is generated every second is incommensurable, and not only that, also the importance of having it has taken such importance, so now we live in an era where the information governs the world and its decisions. Internet has become a great source of information, but while greater is it, is more difficult to find the desired content. Diverse ways have treated to recover information, for example, the Web search machines, that uses different techniques to recover it (searchers like Google, Yahoo, Ask, Vivisimo, and many others), some ones considering the popularity of the pages, the use of clustering, etc; nevertheless, although somehow they help at the time of making queries, users still face against results not at all wished. Therefore, diverse ways are treating to help to search in Internet, that go from the concordance of words to techniques based on the popularity of the sites, unfortunately for many users, this type of results are not enough to them, so that they require more specific solutions. The proposed alternative for this problem is to use the paradigm of the ontologies for Web pages search on a particular subject. When working on a concrete domain, a specialized search is expected, in addition, thanks to the use of ontologies and the order they provide on the concepts that conform them, suppose a great help for users to find the information they wish. In this document we focused in the way to be able to populate an ontology with Web page links, using techniques of automatic classification. In our experiment we were able to report an 86% of well classified elements. The article briefly describes the followed steps to take to the practice the proposed idea. First, is mentioned a brief panorama of the way to recover information by some search machines, next includes the development of a compatible ontology for the use that are hoped to give it, concluding with the steps for the use of an automatic classification method that will be used to populate the ontology, considering the Naive Bayes, k nearest neighbors and support vector machines methods. 2. Search Machines In a traditional search machine the queries are made generally from key words, obtaining by result a listing of Web links that are related about the asked words. Some of the most known search machines are Google, Yahoo, MSN Search, among others. The case of Google emphasizes by the use of its denominated PageRank technology [1], in which is used a formula that calculates the weight of each Web page that is stored in its data base, considering the amount of links that other pages make reference to it. The greater amount of links to a page, greater is its score, becoming thus a popularity contest [2]. Unfortunately has been demonstrated that the results
  • 34. 32 can be manipulated by the well-known Google bombing [3]. Another way to recover information is through Web directories, which consist of a manual organization and classification of Web pages, by subjects or categories. One of the most representative directories is the Open Directory Project [4], in which a set of voluntary publishers are the ones who are in charge to list the Web links inside an ontology, where the links are grouped by similar subjects in categories. The disadvantage that can be appreciated is that it requires too much human intervention to be able to register the links of the pages. A special type of search machines are those that incorporate clustering; the clustering consists of partitioning a set of similar objects in subgroups, where the elements that conforms each subgroup, share common characteristics. This type of search machine gives back the results that find for a query accommodated in groups; examples of this are Clusty.com and Vivisimo.com. Finally, we found specialized search machines, which are centered in recovering links of technical and scientific documents. For example Citeseer, that is a search machine of documents focused on the computation, and that uses the bibliographical references to consider the importance of the documents that are queried. Although these and other techniques have been developed to recover information from the Web, this area still has much to offer, reason why new alternatives to help the user are continued looking for, as is our case. 3. Phase of Ontology Development Gruber [5] defines an ontology like the explicit specification of a conceptualization, which means to identify the concepts that integrate a domain and the interrelations which exist among them, in a formal representation, in a way it could be possible to share and to reuse it. The standard language established by the W3C to make this type of formalizations is the OWL (Web Ontology Language). The use of this standard in addition to the advantages that allows its reusability by others, is that to many tools related to the ontology design and use, are become developed to support it, like editors, reasoners, etc. We developed an ontology, considering such points, along following the proposed methodology by Uschold and King [5]. The domain on which the ontology was developed, was the natural language processing (NLP). In order to develop it, we used the ontology publisher Protégé 3.1.1, the Protégé-OWL 2.1 plug-in, along with the OWL Wizards plug-in, in addition to the FaCT++ 0.99.6 and RacerPro 1.9.0 reasoners, used to verify the ontology consistency. The classes are made up of concepts related to the NLP area, including some ones like investigators, schools, tools and application areas, mainly. 4. Supervised Learning for Automatic Text Classification Once developed the ontology, it continues the phase to populate it with instances. Given the coarseness of pages that exist in the Web, a manual classification of these in the classes established in the ontology would be an expensive task, and also this is already done in great measure in the Web directories. Therefore a way to be able to automate this process was looked for, recurring to the supervised learning, in which by means of statistical and mathematical techniques, an automatic text classification can be done. This approach is centered in having a document training set, previously classified, that will be used to learn to classify new documents. For it, is necessary to transform the initial state of the documents to a representation that can be used by a learning algorithm for the classification. For test aims we only worked with HTML pages. Next are mentioned the necessary steps for make this process. The used training collection consisted of 1624 documents, previously classified in 26 classes taken from the developed ontology, in addition to a denominated null class, in which are classified the non wished documents for the ontology domain. The 26 considered classes are only a representative sample of the existing classes in the ontology and were taken only for test aims. 4.1. Document Preprocessing All the elements (more precisely the words) that appear in documents are not useful for their classification, this is, there are words that by themselves do not say anything about the document’s content in which they are, and therefore, they can be eliminated; among this elements are included the punctuation marks and the HTML labels; also appear words of very frequent use, words that appear in a great amount of documents, which causes that their discriminatory power is very low; these type of words are known like stopwords, examples of them are the articles, pronouns, prepositions, conjunctions, among others.
  • 35. 33 In order to define the stopwords to eliminate, we recurred to lists available in DOCUM [6], in SMART [7] and in the Institut interfacultaire d'informatique of the University of Neuchatel [8], as well as other words identified during the process of tests. Because being working on a specific domain, certain control exists on the terms that belong to it, reason why was suggested a matching between different terms that refers about a same concept, turning them into a single representation, in other words, if for a concept is possible to be called in different forms, we considered to unify them and consider them under a unique form, inside the classification process. For this, we follow like starting point the concepts that integrate the ontology. Finally, many words have the same lexical root; a basic process of stemming based on Porter’s Algorithm was followed [9], with which was looked for to reduce words to their stem. All the previously steps mentioned above, have like aim to diminish the size of the training document collection to make it more manageable, eliminating the irrelevant parts to continue the automatic classification process. In our exercise, in average we get to reduce until a 70% the original size of document collection. 4.2. Vector Space Model The vector space model (VSM) was proposed by Salton in 1975 [11]. The basic idea behind this model is to make a matrix that represents the documents and the words contained in them, assigning a weight to each word. Each vector that conforms the matrix represents a document and the distribution of the words that appears in it. It is a matrix of m x n, where m represents documents and n represents the registered words. There exist different types of weighting for words in the VSM; we considered in our tests boolean weighting (weight of a word is 0 if it not appears in the document, and 1 if it appears), weighed by frequency of appearance (the weight of the term depends on the amount of occurrences the word has in the document) and finally tf-idf weighed (that is calculated considering the average of the term frequency against its inverse document frequency [11]). 4.3. Dimensionality Reduction in the VSM All the words that integrate the training collection cannot be considered in the VSM, since the dimension that it would have would be enormous. Different techniques to reduce the dimensionality exist, like the documental frequency, which considers a minimum value of appearances that must have each word within the total of documents, to discriminate those words whose appearance is very small and to leave those that present a greater documentary frequency. Another technique that was considered was the information gain (IG), which calculates the difference in the entropy of the system against the entropy of each word. This difference, measured in bits, indicates how relevant and with how many information contributes a word in the whole collection, like determining factor to carry out the classification. The amount of total words that conforms the training collection already processed is of 2552196 words, being between these only 125891 different words. As it is possible to be appreciated, the amount of different words is too huge to be handled in the VSM, reason why only were considered those words that passed a documental frequency with a greater or equal value to 15, passing this a total of 8966 words, which represents a 7.12% of the original total words; nevertheless, it still is a very huge amount, reason why the IG was applied on these words. The calculated entropy of the total collection were of 3.97; the considered words were whose who had an IG equal or superior to 0.1, being in 527 different words, a 0.42% of the original size. The words (already stemmed and standardized) that greater IG presented, were: nlp (0.552), natural_language (0.479), knowledge (0.424) and data_min (0.335). 4.4. Automatic Classification Algorithms Once obtained the VSM representation of the training document collection, a method for automatic classification can be applied to classify new elements. The automatic classification methods we considered were the Naive Bayes, k-nearest neighbors (kNN) and support vector machines (SVM), thus to make a series of tests to find the method that better results gives, considering in addition the weightings mentioned in section 4.2. WEKA was used to carry out the tests; the results shown in Table 1 correspond with the use of the 10 fold cross validation, showing the percentage of well classified elements. Tabla 1. Percentage of well classified elements. booleano tf tf-idf NaiveBayes 62.7463 55.8498 81.2192 kNN 84.5443 85.0369 84.4212 SVM 86.2685 66.7488 82.0813
  • 36. 34 Figure 1. Classification test results The best result was obtained with the boolean weighed using support vector machine algorithm, the details of this classification are in Table 2. By using the training that better result offered, is possible to classify new documents that could be obtained from the Web, following the steps for the VSM preprocessing and representation; once obtained the class to which a document belongs, the link direction of the page could be saved like instance of the ontology. Tabla 2. VSM results with boolean weighed Well classified elements 1401 86.27% Wrong classified elements 223 13.73% Root mean squared error 0.1829 5. Conclusions From the obtained results, it was decided the utilization of boolean weighed along with the use of the support vector machine method; the use of the boolean weighed is justified because it is possible to consider that the mere appearance of a word in a document is a good indicator of its discrimination value, specially when considering words not so common in the ordinary speech, but that are relevant for the domain on which they works, as demonstrates the fact that great part of the words with greater information gain belonged to the domain of the NLP, on which was made this work. With the obtained results of the words with greater IG and the obtained exactitude of 86% in the best case by the selected classification method, is a good incentive to consider the use of the automatic classification to populate the ontology. With respect to the future work, once populated the ontology, continues the process to guide the users through the ontology so that they could find links related to the queries made on the domain of the NLP. When concluding this work, between the wanted benefits, are let guide the user through the ontology classification and its different relations, doing use of the bounded natural language technique, to verify the hypothesis that if exists ordered elements, is simpler to find the wished ones and to save work to users while showing them only thematic documents related to the concepts on their queries. 6. References [1] S. Brin, L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, Computer Science Department, Stanford University, Stanford, disponible en línea: http://www.db.stanford.edu/~backrub/google.html, visited on December 2005. [2] M. Miller, 501 Web Site Secrets: Unleashed the Power of Google, Amazon, eBay and More, Whiley Publishing, Inc., USA, 2004. [3] Google bomb, Wikipedia, the free encyclopedia, disponible en línea: http://en.wikipedia.org/wiki/ Google_bomb, visited on March 2006. [4] ODP – Open Directory Project, http://dmoz.org/, visited on April 2006. [5] T. Gruber, A Translation Approach to Portable Ontology Specifications, Knowledge Acquisition, 1993. [6] M. Uschold y M. King, Towards a Methodology for Building Ontologies, Workshop on Basic Ontological Issues in Knowledge Sharing, 1995. [7] DOCUM, a multilingual stopword file for CDS-ISIS, http://library/wur.nl/isis/docum.html, visited on November 2005. [8] ftp://ftp.cs.cornell.edu/pub/smart/, visited on November 2005. [9] J. Savoy, CLEF and multilingual information retrieval, Institut interfacultaire d'informatique, Universidad de Neuchatel, Suiza, 2005, http://www.unine.ch/info/clef/, visited on November 2005. [10] The English (Porter2) stemming algorithm, http://snowball.tartarus.org/algorithms/english/stemmer.html, visited on December 2005. [11] G. Salton y M. J. McGill, Introduction to modern information retrieval. McGraw-Hill, 1983, EUA. 0 10 20 30 40 50 60 70 80 90 100 Naïve Bayes kNN SVM Classification algorithm boolean tf tf-idf
  • 37. 35 Neuro-Symbolic Hybrid Systems Vianey Guadalupe Cruz Sánchez, Gerardo Reyes Salgado, Osslan Osiris Vergara Villegas Centro Nacional de Investigación y Desarrollo tecnológico (cenidet) Interior Internado Palmira S/N, Col. Palmira. C.P. 62490. Cuernavaca Morelos México. {vianey,osslan}@cenidet.edu.mx Resumen Actually, the Hybrid Systems (HS) approach is very used to solve problems where different knowledge representations are involve in one system. This integration has the advantage to compensate the weakness of one or another system complementing their strengths. The Neuro-Symbolic Hybrid Systems (NSHS) arise of the HS as an approach that offer the possibility to implement robust systems where the connectionist and symbolic nature are present in their environment. The reason of the NSHS study is to implement them in the artificial vision system process, such that we can propose different strategies of solution among different representations type involve in this process, for that the Hybrid Systems development cycle and the NSHS classification criteria have a very important role in the definition of these strategy. 1. Introduction In last decade, it was very common working with only one knowledge representation type. Even, there had been one competition to demonstrate the representation used was better than other that solved the same problem. However, with the past of time the researchers observed the weakness of each representation and the complementary properties that existed among those, the scientific community decided to prove their capacities integrate them in one system (Hybrid), in this integration better results were obtained than the results obtained using an individual way. The hybrid approach is inspired in the natural mechanism, where: according to [1], we are processing’s machine of hybrid information, our actions are government by means of the combination of the genetic information and the information acquired by means of the learning. Due to this combination we have the possibility of use different processing’s method in complex and changing environments successfully. Under this natural scheme, the hybrid systems have arisen as a new way to give solution to complex problems where are necessary several knowledge representations to use the information coming of the environment, this environment determine the strategy’s that should be used to increment the knowledge and develop systems more robust [2]. However, the design and development of these systems is difficult due to the big number of pieces or components that are involve and the different interactions among them [3]. The tendency is the study and construction of hybrid systems whose strategy of solution may be the best to solve the problem. In this paper we present a study of the process that involve the desing and development of a hybrid system as well as the criteria for the clasification of one particular type of hybrid system (HS) the Neuro-Symbolic Hybrid System (NSHS), this has been used in applications such as: the object recognition. 2. Design and development of a hybrid system In [1] propose Hybrid Systems development cycle, in which present the process for the construction of this system. The importance of this model are the bases on which one NSHS may support its design. 2.1 Hybrid System development cycle A structured approach like [1] can reduce the development time and cost of a HS. The stages for the construction of inteligent hybrid systems are: problem analysis, matching, selection of the hybrid category,
  • 38. 36 implementation, validation and maintenance (see figure 1). Figure 1. Hybrid system development cycle. a) Problem analysis. This stage involves the following steps: 1. Identify any sub-task existent in the problem. Identificar cualquier sub-tarea existente en el problema. 2. Identify the properties of the problem. If the problem has sub-task, this involve identify properties of them. b) Matching property. Involve the matching among the properties of the available techniques with the requirements of the identified task. c) Hybrid category selection. In this phase we select the hybrid system type required to solve the problem. This phase use the results of the previous stages of problems analysis and the matching property. d) Implementation. In this stage the developer will be in the position of select the programming’s tool and the environment necesary to implement the hybrid system. e) Validation. This phase is used to prove and verify the performance of the individual components of the application and the whole hybrid system. f) Maintenance. The performance of the hybrid system should be periodically valued and refine it as it is necesary. The maintenance is very important for adaptatives systems, (i. e, neural networks). 3. Neuro Symbolic Hybrid Systems The NSHS are systems formed by two or more types of knowledge representations, one in connectionist way and other in symbolic way. Both representations have one group of qualities, these integrated in one system is extremely powerful to solve complex problems. Artificial neural networks are a type of connectionist knowledge representation inspired in the functionality of the biological neuron. This representation type has been used for its learning capability and generalization of knowledge, it being one very powerful tool to solve complex problems of pattern recognition. For other hand, one symbolic representation in format of logical rules is based in the capability that the human has to express the knowledge in natural way. The last thing is very powerful to insert the knowledge of a human expert to one system, as well as explain the problem. This representation type has been used widely in areas such as: pattern recognition, natural language processing, robotics, computer vision and expert systems. Both types of knowledge representations are combined in one system to suppress the disadvantages of one or another representation and take advantage of their integration. The integration of the NSHS and its future application in the design of an artificial vision process, is very important know the criteria used for the NSHS’s classification, due to this classification is obtained a wide view of the different behaviours that may has one NSHS.
  • 39. 37 4. NSHS Classification In order to classify the NSHS are considered many criteria [4]. Next, we explain shortly each one. Tabla 1. Criteria to classify NSHS. a) Integration type Neuro-symbolic integration can be classified in mainly three groups, according to the “hybridation type” of the approach used. • Unified approach. Attempt integrate the symbolic systems properties into connectionist systems and vice versa. •Semi hybrid approach. This approach is used to achieve translations. For example, the compilation of a rules base in a net (knowledge insertion) and the explicitation of rules starting from a net (knowledge extraction). •Hybrid approach. In this approach type may exist many symbolic and connectionist modules integrated to each other. b) Couple grade Define the interaction force between two modules. The classification of different grades is carry out through a progressive level that go since a extreme to another. This classification consists of three levels: • Weak couple. In this architecture type, the different modules are connected by a simple relation of input/output, and the communications are unidirectional. • Medium couple. In this category, the interactions among modules are more flexibles, due those are bidirectional; It doesn’t treat simply of input/output relationship but rather each one the modules can influence on the operation of the another. • Strong couple. In these systems the knowledge and data are not only transferred, also are shared among modules through internal structures in common. c) Integration mode Represent the reason why the neural module and symbolic module are configured in relation of one to other and the full system. •Chain. Two modules operate in sequence. One is the main processor and it is assisted by another module, acting like pre or post processor. The relationship among modules is input/output. •Sub-treatment. In this integration mode, one module is subordinate of another to achieve some function. The main module decides in what moment call it and how use its output. •Meta-treatment. One module solves the problem and the other play a meta-level role such as carry the control or improving results. •Co-treatment. Both modules are the same in the problem solution process. For example: one module solves one specific problem and other module solves the rest of the problem. d) Knowledges tranference The knowledge’s transference may be classified according to the direction of the interchange. •From symbolic to connectionist. The symbolic knowledge is transferred since one symbolic module and it is integrated to one connectionist module(S →C). • From connectionist to symbolic. The knowledge acquired by learning in connectionist net may be explained in symbolic rules way (S→C). •Bilateral transfer. The knowledge can be transferred in both senses: symbolic and connectionist (S↔C). Usually include compilation mechanism and rules extraction starting from the nets.