SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Introduction to Machine
       Learning
                  Lecture 18
                   Clustering

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull
Recap of Lecture 17
        Clustering
                 g




        Hierarchical clustering




                                                     Slide 2
Artificial Intelligence           Machine Learning
Today’s Agenda


        Partitional clustering: K-means
        Applications of clustering
        Using Weka




                                                  Slide 3
Artificial Intelligence        Machine Learning
Partitional Clustering
        Aim
                Assign a set of objects into K clusters with no hierarchical
                s uc u e
                structure
        How?
                First approach: enumerate all partitions and get the one that
                Fi           h               ll   ii       d      h       h
                minimizes a measure of quality
                However
                H
                          To expensive when the number of elements increases
                                                                    2·104 partitions
                          E.g.: Organize 30 objects into 3 groups
                Thence, we need heuristic methods




                                                                                   Slide 4
Artificial Intelligence                      Machine Learning
Defining the Problem
        The problem is
            p
                Map N objects into K clusters
                Each bj t belongs t a separate cluster
                E h object b l    to        tlt
        Key factors
                Criterion function
                Algorithm process


        We’ll see
                Squared error algorithms




                                                         Slide 5
Artificial Intelligence               Machine Learning
Squared Error Algorithms
        Definition of squared error
                       q
                Assume a collection of objects x1, x2, … xN
                We want to organize them in K clusters c1, c2, … cK
                The squared error criterion is defined as




                where




                                                                      Slide 6
Artificial Intelligence                   Machine Learning
Formulation of the Problem
        Goal
                Find the clusterization that minimizes the squared error over all
                poss b e clusterizations
                possible c us e a o s


        Characteristics of k-means
                It was discovered by several researches across different
                disciplines
                Requires the user to specify the number of clusters, which is k
                In this way, we avoid the problem of determining the number of
                clusters
                Uses a heuristic procedure to finish with the best prototypes



                                                                             Slide 7
Artificial Intelligence                Machine Learning
K-means
        The procedure
            p
                 Initialize a k-partition randomly or based on some prior
        1.
                 knowledge. Calculate the c us e p o o ype matrix M
                    o edge Ca cu a e e cluster prototype a
                 Assign each object of the data set to the nearest cluster center
        2.
                 (ci)
                 Recalculate the cluster prototype matrix based on the current
        3.
                 pa t t o
                 partition
                 Repeat steps 2 and 3 until there is no change for each cluster
        4.



         Will this lead the best solution?
                 I don’t know
                 At least, it will lead to an locally optimal solution
                    least

                                                                             Slide 8
Artificial Intelligence                   Machine Learning
Example of k-means




                                                    Slide 9
Artificial Intelligence          Machine Learning
Example of k-means




                                                    Slide 10
Artificial Intelligence          Machine Learning
Example of k-means




                                                    Slide 11
Artificial Intelligence          Machine Learning
Example of k-means




                                                    Slide 12
Artificial Intelligence          Machine Learning
Conservative k-means alg.
        Lloyd algorithm is fast but in each iteration it moves
           y    g
        many data points, not necessarily causing better
        convergence.
        A more conservative method would be to move one
        p
        point at a time only if it improves the overall clustering
                           y         p                           g
        cost
                The s a e t e c uste g cost o a pa t t o o data po ts is
                   e smaller the clustering   of partition of   points s
                the better that clustering is
                Different methods (e g , the squared e o d sto t o ) ca be
                    e e t et ods (e.g., t e squa ed error distortion) can
                used to measure this clustering cost




                                                                         Slide 13
Artificial Intelligence               Machine Learning
Greedy k-means alg.
        Select an arbitrary partition P into k clusters
1.
        while forever
2.
            bestChange ? 0
        1.
            for every cluster C
        2.
        2
                 for every element i not in C
            1.

                      if moving i to cluster C reduces its clustering cost
                                g                                   g
                 1.
                           if (cost(P) – cost(Pi ? C) > bestChange
                      1.

                                 bestChange ? cost(P) – cost(Pi ? C)
                                 i* ? I
                                 C* ? C
            if bestChange > 0
        3.
                 Change partition P by moving i* to C*
            1.

            else
        4.
                 return P
            1.




                                                                             Slide 14
Artificial Intelligence                Machine Learning
Some Remarks
        Further comments about k-means
                No efficient and universal method for identifying the initial
                pa o s
                partitions
                          Run the algorithm many times with random initial partitions
                The iterative approach cannot guarantee convergence to global
                optimum
                          Incorporation of techniques such GAs or SA to empower the
                               p                 q                        p
                          search toward the global optimum
                It is sensitive to outliers and noise
                          Some approaches such as ISODATA and PAM consider the
                          effect of outliers
                The definition of “means” restricts the application to continuous
                variables
                          New dissimilarity measures to deal with categorical variables

                                                                                  Slide 15
Artificial Intelligence                      Machine Learning
APPLICATIONS




                                                      Slide 16
Artificial Intelligence   Machine Learning
Traveling Salesman Problem
        Up to millions of cities
        First organize cities in clusters
        Results of
                10k cities
                100k cities
                1M cities




                                                        Slide 17
Artificial Intelligence              Machine Learning
Bioinformatics – Gene Expression Data

        Application to
         pp
                Genome sequencing projects
                DNA microarray t h l i
                     i         technologies
        DNA microarray technology
                Effective and efficient way to measure gene expression levels
                of thousands of genes simultaneously
        Investigation of the role of the genes
                Clustering: Reveal hidden structures of biological data
                Assumption: Functionally similar genes or proteins usually
                share similar patterns or primary sequence structures




                                                                             Slide 18
Artificial Intelligence                Machine Learning
Bioinformatics – Gene Expression Data




                                             Slide 19
Artificial Intelligence   Machine Learning
Bioinformatics – Gene Expression Data




                                             Slide 20
Artificial Intelligence   Machine Learning
Next Class



        Genetic Fuzzy Systems




                                               Slide 21
Artificial Intelligence     Machine Learning
Introduction to Machine
       Learning
                  Lecture 18
                   Clustering

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull

Mais conteúdo relacionado

Mais procurados

Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
Albert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
Albert Orriols-Puig
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
Albert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
Albert Orriols-Puig
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
Albert Orriols-Puig
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
Albert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
Albert Orriols-Puig
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebra
Ale Cignetti
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
butest
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
Albert Orriols-Puig
 

Mais procurados (20)

Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebra
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
 

Semelhante a Lecture18

20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02
Computer Science Club
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
butest
 

Semelhante a Lecture18 (20)

Clustering
ClusteringClustering
Clustering
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Review_Cibe Sridharan
Review_Cibe SridharanReview_Cibe Sridharan
Review_Cibe Sridharan
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine Learning
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayes
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine Learning
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Lattice Cryptography
Lattice CryptographyLattice Cryptography
Lattice Cryptography
 

Mais de Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
Albert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
Albert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
Albert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
Albert Orriols-Puig
 

Mais de Albert Orriols-Puig (6)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

Lecture18

  • 1. Introduction to Machine Learning Lecture 18 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lecture 17 Clustering g Hierarchical clustering Slide 2 Artificial Intelligence Machine Learning
  • 3. Today’s Agenda Partitional clustering: K-means Applications of clustering Using Weka Slide 3 Artificial Intelligence Machine Learning
  • 4. Partitional Clustering Aim Assign a set of objects into K clusters with no hierarchical s uc u e structure How? First approach: enumerate all partitions and get the one that Fi h ll ii d h h minimizes a measure of quality However H To expensive when the number of elements increases 2·104 partitions E.g.: Organize 30 objects into 3 groups Thence, we need heuristic methods Slide 4 Artificial Intelligence Machine Learning
  • 5. Defining the Problem The problem is p Map N objects into K clusters Each bj t belongs t a separate cluster E h object b l to tlt Key factors Criterion function Algorithm process We’ll see Squared error algorithms Slide 5 Artificial Intelligence Machine Learning
  • 6. Squared Error Algorithms Definition of squared error q Assume a collection of objects x1, x2, … xN We want to organize them in K clusters c1, c2, … cK The squared error criterion is defined as where Slide 6 Artificial Intelligence Machine Learning
  • 7. Formulation of the Problem Goal Find the clusterization that minimizes the squared error over all poss b e clusterizations possible c us e a o s Characteristics of k-means It was discovered by several researches across different disciplines Requires the user to specify the number of clusters, which is k In this way, we avoid the problem of determining the number of clusters Uses a heuristic procedure to finish with the best prototypes Slide 7 Artificial Intelligence Machine Learning
  • 8. K-means The procedure p Initialize a k-partition randomly or based on some prior 1. knowledge. Calculate the c us e p o o ype matrix M o edge Ca cu a e e cluster prototype a Assign each object of the data set to the nearest cluster center 2. (ci) Recalculate the cluster prototype matrix based on the current 3. pa t t o partition Repeat steps 2 and 3 until there is no change for each cluster 4. Will this lead the best solution? I don’t know At least, it will lead to an locally optimal solution least Slide 8 Artificial Intelligence Machine Learning
  • 9. Example of k-means Slide 9 Artificial Intelligence Machine Learning
  • 10. Example of k-means Slide 10 Artificial Intelligence Machine Learning
  • 11. Example of k-means Slide 11 Artificial Intelligence Machine Learning
  • 12. Example of k-means Slide 12 Artificial Intelligence Machine Learning
  • 13. Conservative k-means alg. Lloyd algorithm is fast but in each iteration it moves y g many data points, not necessarily causing better convergence. A more conservative method would be to move one p point at a time only if it improves the overall clustering y p g cost The s a e t e c uste g cost o a pa t t o o data po ts is e smaller the clustering of partition of points s the better that clustering is Different methods (e g , the squared e o d sto t o ) ca be e e t et ods (e.g., t e squa ed error distortion) can used to measure this clustering cost Slide 13 Artificial Intelligence Machine Learning
  • 14. Greedy k-means alg. Select an arbitrary partition P into k clusters 1. while forever 2. bestChange ? 0 1. for every cluster C 2. 2 for every element i not in C 1. if moving i to cluster C reduces its clustering cost g g 1. if (cost(P) – cost(Pi ? C) > bestChange 1. bestChange ? cost(P) – cost(Pi ? C) i* ? I C* ? C if bestChange > 0 3. Change partition P by moving i* to C* 1. else 4. return P 1. Slide 14 Artificial Intelligence Machine Learning
  • 15. Some Remarks Further comments about k-means No efficient and universal method for identifying the initial pa o s partitions Run the algorithm many times with random initial partitions The iterative approach cannot guarantee convergence to global optimum Incorporation of techniques such GAs or SA to empower the p q p search toward the global optimum It is sensitive to outliers and noise Some approaches such as ISODATA and PAM consider the effect of outliers The definition of “means” restricts the application to continuous variables New dissimilarity measures to deal with categorical variables Slide 15 Artificial Intelligence Machine Learning
  • 16. APPLICATIONS Slide 16 Artificial Intelligence Machine Learning
  • 17. Traveling Salesman Problem Up to millions of cities First organize cities in clusters Results of 10k cities 100k cities 1M cities Slide 17 Artificial Intelligence Machine Learning
  • 18. Bioinformatics – Gene Expression Data Application to pp Genome sequencing projects DNA microarray t h l i i technologies DNA microarray technology Effective and efficient way to measure gene expression levels of thousands of genes simultaneously Investigation of the role of the genes Clustering: Reveal hidden structures of biological data Assumption: Functionally similar genes or proteins usually share similar patterns or primary sequence structures Slide 18 Artificial Intelligence Machine Learning
  • 19. Bioinformatics – Gene Expression Data Slide 19 Artificial Intelligence Machine Learning
  • 20. Bioinformatics – Gene Expression Data Slide 20 Artificial Intelligence Machine Learning
  • 21. Next Class Genetic Fuzzy Systems Slide 21 Artificial Intelligence Machine Learning
  • 22. Introduction to Machine Learning Lecture 18 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull