Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps
1. Web Page Clustering Using a Fuzzy Logic Based
Representation and Self-organizing Maps
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez
NLP & IR Group, UNED
December 12, 2008
2. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 2
3. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 3
4. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Objectives
Group HTML documents by content similarity.
Self-Organizing Maps (SOM) to organize, visualize and
navigate through the collection.
Term weighting function taking advantage of HTML tags
Combining, by means of fuzzy logic, heuristic criteria based on
the inherent semantics of some HTML tags and word positions
in the document.
Hypothesis
An improvement in document representation will involve an
increase in map quality.
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 4
5. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
1 Fuzzy Logic
2 EFCC
3 Linguistic Variables
4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 5
6. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Fuzzy logic
Capturing human expert knowledge.
Close to natural language.
Knowledge base: defined by a set of IF-THEN rules.
Linguistic variables
Defined using natural language words and fuzzy sets.
These sets allow the description of the membership degree of
an object to a particular class.
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 6
7. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
1 Fuzzy Logic
2 EFCC
3 Linguistic Variables
4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 7
8. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 8
9. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 9
10. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 10
11. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 11
12. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 12
13. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 13
14. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 14
15. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 15
16. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 16
17. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 17
18. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 18
19. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
1 Fuzzy Logic
2 EFCC
3 Linguistic Variables
4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 19
20. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 20
21. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 21
22. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 22
23. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 23
24. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 24
25. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 25
26. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
1 Fuzzy Logic
2 EFCC
3 Linguistic Variables
4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 26
27. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 27
28. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 28
29. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 29
30. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 30
31. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
1 Dimensionality Reduction
2 Document Map
3 Evaluation Methods
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 31
32. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Dimensionality Reduction
Input vectors dimension ranging from 100 to 5000
Stopwords, puntuaction marks suffixes, and words occurring
less than 50 times in the whole corpus were removed.
Two well known methods:
Document frequency reduction.
Random projection method.
Three proposed rank-based methods:
Most Valued Terms.
Fixed reduction method.
More Frequent Terms until n level.
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 32
33. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
1 Dimensionality Reduction
2 Document Map
3 Evaluation Methods
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 33
34. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Document Map Construction
Benchmark dataset for clustering: Banksearch1
10000 documents
10 classes
SOM size was set equal to the number of classes of input
documents, i.e. 5x2, in order to compare clustering results.
1
M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing
Systems: Design, Management, and Applications, 2002.
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 34
35. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
1 Dimensionality Reduction
2 Document Map
3 Evaluation Methods
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 35
36. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Evaluation Methods
Weighted average of the F-measure for each class.
After mapping the collection in the trained map, the class
with greater number of documents mapped on a neuron will
be selected to label the unit.
All the document vectors in a neuron which class is different
from the neuron label will be counted as errors.
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 36
37. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 37
38. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Best reduction for each term weighting function
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 38
39. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
MFTn reduction provides stability
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 39
40. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
EFCC+MFTn obtains its best results with the
smallest number of features
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 40
41. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria
(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 41
42. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Conclusion
Unsupervised document representation method, based on
fuzzy logic, focused on clustering HTML documents by means
of self-organizing maps.
MFTn reduction is the most stable reduction in all cases.
EFCC representation allows to obtain better results using a
smaller vocabulary.
Smaller number of features needed to represent the input
documents and SOM unit vectors, which implies an
improvement in computational cost.
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 42
43. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Thank You!
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 43
44. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Related Work
VSM Topic Document Weighting Modifies
Information Type Function SOM
Self organization of
a Massive Document Yes Yes Text Shannon’s Entrophy No
Collection2
Document Clustering Yes No Text Binary, TF, TF-IDF No
using Phrases3
Document Clustering Yes Yes Text ESVM, HSVM, HyM No
using WordNet4
Conceptional SOM5 Yes No Text TF Yes
2
T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a
massive document collection. IEEE Trans. on Neural Networks, 2000.
3
J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.
4
C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J.
Hybrid Intell. Syst., 2004
5
Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing,
2008
Alberto P. Garc´
ıa-Plaza, V´
ıctor Fresno, Raquel Mart´
ınez, NLP & IR Group, UNED slide 44