SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Web Page Clustering Using a Fuzzy Logic Based
   Representation and Self-organizing Maps

    Alberto P. Garc´
                   ıa-Plaza, V´
                              ıctor Fresno, Raquel Mart´
                                                       ınez
                     NLP & IR Group, UNED

                       December 12, 2008
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 2
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 3
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                  Objectives


              Group HTML documents by content similarity.
              Self-Organizing Maps (SOM) to organize, visualize and
              navigate through the collection.
              Term weighting function taking advantage of HTML tags
                      Combining, by means of fuzzy logic, heuristic criteria based on
                      the inherent semantics of some HTML tags and word positions
                      in the document.

       Hypothesis
       An improvement in document representation will involve an
       increase in map quality.



Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 4
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 5
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                 Fuzzy logic



              Capturing human expert knowledge.
              Close to natural language.
              Knowledge base: defined by a set of IF-THEN rules.
              Linguistic variables
                      Defined using natural language words and fuzzy sets.
                      These sets allow the description of the membership degree of
                      an object to a particular class.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 6
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 7
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 8
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 9
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 10
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 11
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 12
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 13
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 14
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 15
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 16
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 17
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 18
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 19
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 20
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 21
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 22
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 23
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 24
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 25
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 26
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 27
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 28
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 29
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 30
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 31
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                  Dimensionality Reduction


              Input vectors dimension ranging from 100 to 5000
              Stopwords, puntuaction marks suffixes, and words occurring
              less than 50 times in the whole corpus were removed.
              Two well known methods:
                      Document frequency reduction.
                      Random projection method.
              Three proposed rank-based methods:
                      Most Valued Terms.
                      Fixed reduction method.
                      More Frequent Terms until n level.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 32
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 33
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                 Experiment Description               Results                  Conclusion


                              Document Map Construction



              Benchmark dataset for clustering: Banksearch1
                      10000 documents
                      10 classes
              SOM size was set equal to the number of classes of input
              documents, i.e. 5x2, in order to compare clustering results.




            1
              M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing
       Systems: Design, Management, and Applications, 2002.
Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                        slide 34
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 35
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Evaluation Methods



              Weighted average of the F-measure for each class.
              After mapping the collection in the trained map, the class
              with greater number of documents mapped on a neuron will
              be selected to label the unit.
              All the document vectors in a neuron which class is different
              from the neuron label will be counted as errors.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 36
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 37
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


             Best reduction for each term weighting function




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 38
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                         MFTn reduction provides stability




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 39
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


             EFCC+MFTn obtains its best results with the
                   smallest number of features




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 40
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 41
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                 Conclusion


              Unsupervised document representation method, based on
              fuzzy logic, focused on clustering HTML documents by means
              of self-organizing maps.
              MFTn reduction is the most stable reduction in all cases.
              EFCC representation allows to obtain better results using a
              smaller vocabulary.
              Smaller number of features needed to represent the input
              documents and SOM unit vectors, which implies an
              improvement in computational cost.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 42
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                            Thank You!




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 43
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives                Our Approach                  Experiment Description                   Results               Conclusion


                                                 Related Work

                                       VSM       Topic     Document                    Weighting             Modifies
                                               Information   Type                      Function               SOM
         Self organization of
         a Massive Document             Yes         Yes             Text         Shannon’s Entrophy              No
         Collection2
         Document Clustering            Yes          No             Text         Binary, TF, TF-IDF              No
         using Phrases3
         Document Clustering            Yes         Yes             Text        ESVM, HSVM, HyM                  No
         using WordNet4
         Conceptional SOM5              Yes          No             Text                    TF                   Yes




            2
              T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a
       massive document collection. IEEE Trans. on Neural Networks, 2000.
            3
              J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.
            4
              C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J.
       Hybrid Intell. Syst., 2004
            5
              Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing,
       2008
Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                                slide 44

Mais conteúdo relacionado

Destaque

Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logicvini89
 
Developing Efficient Web-based GIS Applications
Developing Efficient Web-based GIS ApplicationsDeveloping Efficient Web-based GIS Applications
Developing Efficient Web-based GIS ApplicationsSwetha A
 
Introduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultIntroduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultNaivedya Mishra
 
Synthetic aperture radar
Synthetic aperture radarSynthetic aperture radar
Synthetic aperture radarMahesh pawar
 
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...Swetha A
 
Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930JiyaE
 
OSM and QGIS
OSM and QGISOSM and QGIS
OSM and QGISQGIS UK
 
Map to Image Georeferencing using ERDAS software
 Map  to Image Georeferencing using ERDAS software Map  to Image Georeferencing using ERDAS software
Map to Image Georeferencing using ERDAS softwareSwetha A
 
33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlab33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlabsai kumar
 
Synthetic aperture radar_advanced
Synthetic aperture radar_advancedSynthetic aperture radar_advanced
Synthetic aperture radar_advancedNaivedya Mishra
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisSayed Abulhasan Quadri
 
Radar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radarRadar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radarForward2025
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Radar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarRadar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarForward2025
 
GEOPROCESSING IN QGIS
GEOPROCESSING IN QGISGEOPROCESSING IN QGIS
GEOPROCESSING IN QGISSwetha A
 
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...Swetha A
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSwetha A
 
Matlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionMatlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionDataminingTools Inc
 

Destaque (20)

Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Analysing Web GIS apps
Analysing Web GIS appsAnalysing Web GIS apps
Analysing Web GIS apps
 
Developing Efficient Web-based GIS Applications
Developing Efficient Web-based GIS ApplicationsDeveloping Efficient Web-based GIS Applications
Developing Efficient Web-based GIS Applications
 
Introduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultIntroduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouault
 
Synthetic aperture radar
Synthetic aperture radarSynthetic aperture radar
Synthetic aperture radar
 
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
 
Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930
 
OSM and QGIS
OSM and QGISOSM and QGIS
OSM and QGIS
 
Map to Image Georeferencing using ERDAS software
 Map  to Image Georeferencing using ERDAS software Map  to Image Georeferencing using ERDAS software
Map to Image Georeferencing using ERDAS software
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlab33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlab
 
Synthetic aperture radar_advanced
Synthetic aperture radar_advancedSynthetic aperture radar_advanced
Synthetic aperture radar_advanced
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component Analysis
 
Radar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radarRadar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radar
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Radar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarRadar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radar
 
GEOPROCESSING IN QGIS
GEOPROCESSING IN QGISGEOPROCESSING IN QGIS
GEOPROCESSING IN QGIS
 
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS software
 
Matlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionMatlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge Detection
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

  • 1. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, UNED December 12, 2008
  • 2. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 2
  • 3. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 3
  • 4. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Objectives Group HTML documents by content similarity. Self-Organizing Maps (SOM) to organize, visualize and navigate through the collection. Term weighting function taking advantage of HTML tags Combining, by means of fuzzy logic, heuristic criteria based on the inherent semantics of some HTML tags and word positions in the document. Hypothesis An improvement in document representation will involve an increase in map quality. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 4
  • 5. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 5
  • 6. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Fuzzy logic Capturing human expert knowledge. Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Linguistic variables Defined using natural language words and fuzzy sets. These sets allow the description of the membership degree of an object to a particular class. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 6
  • 7. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 7
  • 8. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 8
  • 9. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 9
  • 10. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 10
  • 11. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 11
  • 12. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 12
  • 13. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 13
  • 14. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 14
  • 15. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 15
  • 16. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 16
  • 17. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 17
  • 18. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 18
  • 19. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 19
  • 20. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 20
  • 21. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 21
  • 22. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 22
  • 23. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 23
  • 24. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 24
  • 25. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 25
  • 26. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 26
  • 27. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 27
  • 28. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 28
  • 29. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 29
  • 30. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 30
  • 31. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 31
  • 32. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Dimensionality Reduction Input vectors dimension ranging from 100 to 5000 Stopwords, puntuaction marks suffixes, and words occurring less than 50 times in the whole corpus were removed. Two well known methods: Document frequency reduction. Random projection method. Three proposed rank-based methods: Most Valued Terms. Fixed reduction method. More Frequent Terms until n level. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 32
  • 33. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 33
  • 34. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Document Map Construction Benchmark dataset for clustering: Banksearch1 10000 documents 10 classes SOM size was set equal to the number of classes of input documents, i.e. 5x2, in order to compare clustering results. 1 M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing Systems: Design, Management, and Applications, 2002. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 34
  • 35. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 35
  • 36. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Evaluation Methods Weighted average of the F-measure for each class. After mapping the collection in the trained map, the class with greater number of documents mapped on a neuron will be selected to label the unit. All the document vectors in a neuron which class is different from the neuron label will be counted as errors. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 36
  • 37. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 37
  • 38. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Best reduction for each term weighting function Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 38
  • 39. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion MFTn reduction provides stability Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 39
  • 40. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion EFCC+MFTn obtains its best results with the smallest number of features Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 40
  • 41. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 41
  • 42. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Conclusion Unsupervised document representation method, based on fuzzy logic, focused on clustering HTML documents by means of self-organizing maps. MFTn reduction is the most stable reduction in all cases. EFCC representation allows to obtain better results using a smaller vocabulary. Smaller number of features needed to represent the input documents and SOM unit vectors, which implies an improvement in computational cost. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 42
  • 43. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Thank You! Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 43
  • 44. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Related Work VSM Topic Document Weighting Modifies Information Type Function SOM Self organization of a Massive Document Yes Yes Text Shannon’s Entrophy No Collection2 Document Clustering Yes No Text Binary, TF, TF-IDF No using Phrases3 Document Clustering Yes Yes Text ESVM, HSVM, HyM No using WordNet4 Conceptional SOM5 Yes No Text TF Yes 2 T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. on Neural Networks, 2000. 3 J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002. 4 C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J. Hybrid Intell. Syst., 2004 5 Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing, 2008 Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 44