SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Introduction to Machine
       Learning
                  Lecture 18
                   Clustering

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull
Recap of Lecture 17
        Clustering
                 g




        Hierarchical clustering




                                                     Slide 2
Artificial Intelligence           Machine Learning
Today’s Agenda


        Partitional clustering: K-means
        Applications of clustering
        Using Weka




                                                  Slide 3
Artificial Intelligence        Machine Learning
Partitional Clustering
        Aim
                Assign a set of objects into K clusters with no hierarchical
                s uc u e
                structure
        How?
                First approach: enumerate all partitions and get the one that
                Fi           h               ll   ii       d      h       h
                minimizes a measure of quality
                However
                H
                          To expensive when the number of elements increases
                                                                    2·104 partitions
                          E.g.: Organize 30 objects into 3 groups
                Thence, we need heuristic methods




                                                                                   Slide 4
Artificial Intelligence                      Machine Learning
Defining the Problem
        The problem is
            p
                Map N objects into K clusters
                Each bj t belongs t a separate cluster
                E h object b l    to        tlt
        Key factors
                Criterion function
                Algorithm process


        We’ll see
                Squared error algorithms




                                                         Slide 5
Artificial Intelligence               Machine Learning
Squared Error Algorithms
        Definition of squared error
                       q
                Assume a collection of objects x1, x2, … xN
                We want to organize them in K clusters c1, c2, … cK
                The squared error criterion is defined as




                where




                                                                      Slide 6
Artificial Intelligence                   Machine Learning
Formulation of the Problem
        Goal
                Find the clusterization that minimizes the squared error over all
                poss b e clusterizations
                possible c us e a o s


        Characteristics of k-means
                It was discovered by several researches across different
                disciplines
                Requires the user to specify the number of clusters, which is k
                In this way, we avoid the problem of determining the number of
                clusters
                Uses a heuristic procedure to finish with the best prototypes



                                                                             Slide 7
Artificial Intelligence                Machine Learning
K-means
        The procedure
            p
                 Initialize a k-partition randomly or based on some prior
        1.
                 knowledge. Calculate the c us e p o o ype matrix M
                    o edge Ca cu a e e cluster prototype a
                 Assign each object of the data set to the nearest cluster center
        2.
                 (ci)
                 Recalculate the cluster prototype matrix based on the current
        3.
                 pa t t o
                 partition
                 Repeat steps 2 and 3 until there is no change for each cluster
        4.



         Will this lead the best solution?
                 I don’t know
                 At least, it will lead to an locally optimal solution
                    least

                                                                             Slide 8
Artificial Intelligence                   Machine Learning
Example of k-means




                                                    Slide 9
Artificial Intelligence          Machine Learning
Example of k-means




                                                    Slide 10
Artificial Intelligence          Machine Learning
Example of k-means




                                                    Slide 11
Artificial Intelligence          Machine Learning
Example of k-means




                                                    Slide 12
Artificial Intelligence          Machine Learning
Conservative k-means alg.
        Lloyd algorithm is fast but in each iteration it moves
           y    g
        many data points, not necessarily causing better
        convergence.
        A more conservative method would be to move one
        p
        point at a time only if it improves the overall clustering
                           y         p                           g
        cost
                The s a e t e c uste g cost o a pa t t o o data po ts is
                   e smaller the clustering   of partition of   points s
                the better that clustering is
                Different methods (e g , the squared e o d sto t o ) ca be
                    e e t et ods (e.g., t e squa ed error distortion) can
                used to measure this clustering cost




                                                                         Slide 13
Artificial Intelligence               Machine Learning
Greedy k-means alg.
        Select an arbitrary partition P into k clusters
1.
        while forever
2.
            bestChange ? 0
        1.
            for every cluster C
        2.
        2
                 for every element i not in C
            1.

                      if moving i to cluster C reduces its clustering cost
                                g                                   g
                 1.
                           if (cost(P) – cost(Pi ? C) > bestChange
                      1.

                                 bestChange ? cost(P) – cost(Pi ? C)
                                 i* ? I
                                 C* ? C
            if bestChange > 0
        3.
                 Change partition P by moving i* to C*
            1.

            else
        4.
                 return P
            1.




                                                                             Slide 14
Artificial Intelligence                Machine Learning
Some Remarks
        Further comments about k-means
                No efficient and universal method for identifying the initial
                pa o s
                partitions
                          Run the algorithm many times with random initial partitions
                The iterative approach cannot guarantee convergence to global
                optimum
                          Incorporation of techniques such GAs or SA to empower the
                               p                 q                        p
                          search toward the global optimum
                It is sensitive to outliers and noise
                          Some approaches such as ISODATA and PAM consider the
                          effect of outliers
                The definition of “means” restricts the application to continuous
                variables
                          New dissimilarity measures to deal with categorical variables

                                                                                  Slide 15
Artificial Intelligence                      Machine Learning
APPLICATIONS




                                                      Slide 16
Artificial Intelligence   Machine Learning
Traveling Salesman Problem
        Up to millions of cities
        First organize cities in clusters
        Results of
                10k cities
                100k cities
                1M cities




                                                        Slide 17
Artificial Intelligence              Machine Learning
Bioinformatics – Gene Expression Data

        Application to
         pp
                Genome sequencing projects
                DNA microarray t h l i
                     i         technologies
        DNA microarray technology
                Effective and efficient way to measure gene expression levels
                of thousands of genes simultaneously
        Investigation of the role of the genes
                Clustering: Reveal hidden structures of biological data
                Assumption: Functionally similar genes or proteins usually
                share similar patterns or primary sequence structures




                                                                             Slide 18
Artificial Intelligence                Machine Learning
Bioinformatics – Gene Expression Data




                                             Slide 19
Artificial Intelligence   Machine Learning
Bioinformatics – Gene Expression Data




                                             Slide 20
Artificial Intelligence   Machine Learning
Next Class



        Genetic Fuzzy Systems




                                               Slide 21
Artificial Intelligence     Machine Learning
Introduction to Machine
       Learning
                  Lecture 18
                   Clustering

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull

Mais conteúdo relacionado

Mais procurados

Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsAlbert Orriols-Puig
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...Albert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...Albert Orriols-Puig
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebraAle Cignetti
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...Anderson Pinho
 

Mais procurados (20)

Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebra
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
 

Semelhante a Lecture18

[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)seungwoo kim
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02Computer Science Club
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine LearningHumberto Marchezi
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine LearningSARCCOM
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayesKyuri Kim
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningRebecca Bilbro
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkitde:code 2017
 
Lattice Cryptography
Lattice CryptographyLattice Cryptography
Lattice CryptographyPriyanka Aash
 

Semelhante a Lecture18 (20)

Clustering
ClusteringClustering
Clustering
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Review_Cibe Sridharan
Review_Cibe SridharanReview_Cibe Sridharan
Review_Cibe Sridharan
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine Learning
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayes
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine Learning
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Lattice Cryptography
Lattice CryptographyLattice Cryptography
Lattice Cryptography
 

Mais de Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...Albert Orriols-Puig
 

Mais de Albert Orriols-Puig (6)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Último

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 

Último (20)

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 

Lecture18

  • 1. Introduction to Machine Learning Lecture 18 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lecture 17 Clustering g Hierarchical clustering Slide 2 Artificial Intelligence Machine Learning
  • 3. Today’s Agenda Partitional clustering: K-means Applications of clustering Using Weka Slide 3 Artificial Intelligence Machine Learning
  • 4. Partitional Clustering Aim Assign a set of objects into K clusters with no hierarchical s uc u e structure How? First approach: enumerate all partitions and get the one that Fi h ll ii d h h minimizes a measure of quality However H To expensive when the number of elements increases 2·104 partitions E.g.: Organize 30 objects into 3 groups Thence, we need heuristic methods Slide 4 Artificial Intelligence Machine Learning
  • 5. Defining the Problem The problem is p Map N objects into K clusters Each bj t belongs t a separate cluster E h object b l to tlt Key factors Criterion function Algorithm process We’ll see Squared error algorithms Slide 5 Artificial Intelligence Machine Learning
  • 6. Squared Error Algorithms Definition of squared error q Assume a collection of objects x1, x2, … xN We want to organize them in K clusters c1, c2, … cK The squared error criterion is defined as where Slide 6 Artificial Intelligence Machine Learning
  • 7. Formulation of the Problem Goal Find the clusterization that minimizes the squared error over all poss b e clusterizations possible c us e a o s Characteristics of k-means It was discovered by several researches across different disciplines Requires the user to specify the number of clusters, which is k In this way, we avoid the problem of determining the number of clusters Uses a heuristic procedure to finish with the best prototypes Slide 7 Artificial Intelligence Machine Learning
  • 8. K-means The procedure p Initialize a k-partition randomly or based on some prior 1. knowledge. Calculate the c us e p o o ype matrix M o edge Ca cu a e e cluster prototype a Assign each object of the data set to the nearest cluster center 2. (ci) Recalculate the cluster prototype matrix based on the current 3. pa t t o partition Repeat steps 2 and 3 until there is no change for each cluster 4. Will this lead the best solution? I don’t know At least, it will lead to an locally optimal solution least Slide 8 Artificial Intelligence Machine Learning
  • 9. Example of k-means Slide 9 Artificial Intelligence Machine Learning
  • 10. Example of k-means Slide 10 Artificial Intelligence Machine Learning
  • 11. Example of k-means Slide 11 Artificial Intelligence Machine Learning
  • 12. Example of k-means Slide 12 Artificial Intelligence Machine Learning
  • 13. Conservative k-means alg. Lloyd algorithm is fast but in each iteration it moves y g many data points, not necessarily causing better convergence. A more conservative method would be to move one p point at a time only if it improves the overall clustering y p g cost The s a e t e c uste g cost o a pa t t o o data po ts is e smaller the clustering of partition of points s the better that clustering is Different methods (e g , the squared e o d sto t o ) ca be e e t et ods (e.g., t e squa ed error distortion) can used to measure this clustering cost Slide 13 Artificial Intelligence Machine Learning
  • 14. Greedy k-means alg. Select an arbitrary partition P into k clusters 1. while forever 2. bestChange ? 0 1. for every cluster C 2. 2 for every element i not in C 1. if moving i to cluster C reduces its clustering cost g g 1. if (cost(P) – cost(Pi ? C) > bestChange 1. bestChange ? cost(P) – cost(Pi ? C) i* ? I C* ? C if bestChange > 0 3. Change partition P by moving i* to C* 1. else 4. return P 1. Slide 14 Artificial Intelligence Machine Learning
  • 15. Some Remarks Further comments about k-means No efficient and universal method for identifying the initial pa o s partitions Run the algorithm many times with random initial partitions The iterative approach cannot guarantee convergence to global optimum Incorporation of techniques such GAs or SA to empower the p q p search toward the global optimum It is sensitive to outliers and noise Some approaches such as ISODATA and PAM consider the effect of outliers The definition of “means” restricts the application to continuous variables New dissimilarity measures to deal with categorical variables Slide 15 Artificial Intelligence Machine Learning
  • 16. APPLICATIONS Slide 16 Artificial Intelligence Machine Learning
  • 17. Traveling Salesman Problem Up to millions of cities First organize cities in clusters Results of 10k cities 100k cities 1M cities Slide 17 Artificial Intelligence Machine Learning
  • 18. Bioinformatics – Gene Expression Data Application to pp Genome sequencing projects DNA microarray t h l i i technologies DNA microarray technology Effective and efficient way to measure gene expression levels of thousands of genes simultaneously Investigation of the role of the genes Clustering: Reveal hidden structures of biological data Assumption: Functionally similar genes or proteins usually share similar patterns or primary sequence structures Slide 18 Artificial Intelligence Machine Learning
  • 19. Bioinformatics – Gene Expression Data Slide 19 Artificial Intelligence Machine Learning
  • 20. Bioinformatics – Gene Expression Data Slide 20 Artificial Intelligence Machine Learning
  • 21. Next Class Genetic Fuzzy Systems Slide 21 Artificial Intelligence Machine Learning
  • 22. Introduction to Machine Learning Lecture 18 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull