SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Machine Learning Basics
Classification and Clustering
Humberto Marchezi
hcmarchezi@gmail.com
November 2015
Definitions
Pattern recognition, artificial intelligence and a bit of data
mining
Solves a given task without explicitly being programmed to do
so instead it makes predictions from provided data
Machine learning algorithms can be divided into 3 categories:
Supervised learning
Unsupervised learning
Reinforcement learning
Problem types
Classification
Regression
Clustering
etc.
Algorithms
Supervised Learning
Naive Bayesian Classifier
Linear/Polynomial/Logistic/Multinomial Regression
Neural Networks
etc.
Unsupervised Learning
K-means / K-medoids
Principal Component Analysis
Gaussian Distribution (Anomaly Detection)
etc.
Naive Bayes Classifier
Classify information based on probabilistic model score
Score for a category ck with features f1, f2, f3, ..., fn
p(Ck|f1, f2, ..., fn) = P(Ck )p(f1|Ck )p(f2|Ck )...p(fn|Ck )
p(f1)p(f2)...p(fn)
For a text classifier, features above are each word in the
sentence (bag-of-words model)
Also known as multinomial naive bayes classifier
Naive Bayes Classifier
Concrete Example
Ingredients
2 tbsp salt
lemon
Instructions
Cut lemon
Pour salt
Naive Bayes Classifier
Concrete Example
Ingredients
word occurrences
2 1
tbsp 1
salt 1
lemon 1
total 4
examples 2
Instructions
word occurrences
cut 1
lemon 1
pour 1
salt 1
total 4
examples 2
Global
word occurrences
2 1
tbsp 1
salt 2
lemon 2
cut 1
pour 1
total 8
examples 4
Naive Bayes Classifier
Concrete Example
Ingredients 1/2
word probability
2 1/4
tbsp 1/4
salt 1/4
lemon 1/4
Instructions 1/2
word probability
cut 1/4
lemon 1/4
pour 1/4
salt 1/4
Global
word probability
2 1/8
tbsp 1/8
salt 2/8
lemon 2/8
cut 1/8
pour 1/8
Naive Bayes Classifier
Concrete Example
Query ’1 tbsp salt’
Ingredients (I)
p(I| 1 , tbsp , salt ) = P(I)p( 1 |I)p( tbsp |I)p( salt |I)
p( 1 )p( tbsp )p( salt )
= 0.5x0.0001x0.25x0.25
0.0001x0.125x0.25 = 1
Instructions (D)
p(D| 1 , tbsp , salt ) = P(D)p( 1 |D)p( tbsp |D)p( salt |D)
p( 1 )p( tbsp )p( salt )
= 0.5x0.0001x0.0001x0.25
0.0001x0.125x0.25 = 0.0004
Result: Ingredients (since it has the highest probability)
Note: 0.0001 is the probability of an unknown element (cannot be
zero!)
Naive Bayes Classifier
Examples
Classify email as spam or not spam
Document type classification
Document sections classification
Image Classification
K-Means
Unsupervised learning algorithm to identify clusters
Find clusters for unlabeled data
Algorithm
k-means
Choose K examples as initial centroids
While centroids move
1) Choose closest centroid Ki for each xi and store distance ci
2) Calculate new centroid Ki in each cluster
end
K-Means
K-means example steps to converge to final solution
Figure : Taken from https://en.wikipedia.org/wiki/File:
K_Means_Example_Step_2.svg
K-Means
How to avoid sub-optimal results ?
Figure : Generated from http://www.naftaliharris.com/blog/
visualizing-k-means-clustering/
K-Means
How to avoid sub-optimal results ?
k-means
Repeat N times do
Randomly choose K examples as initial centroids
While centroids move
1) Choose closest centroid Ki for each xi and store distance ci
2) Calculate new centroid Ki in each cluster
end
Calculate result cost (average distance of examples to its centroids)
If result cost is lower
end (repeat)
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=1
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=2
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=3
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=4
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=5
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Cluster costs
K-Means
Elbow Method - How to identify the number of clusters ?
Elbow method
Repeat for clusters K = 1,2,3,...n
Run K-Means
Compute average cost for K clusters
n
i=1 ci
n (simplifying
n
i=1 ci )
end (repeat)
Plot cost for each K and choose the one located at the ”elbow”
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
Not always possible to find elbow (well distributes examples)
Best practice associate cluster number with business meaning
K-Means
Examples
Figure : Customer segmentation with k-means
K-Means
Examples
Figure : Identify related news and articles
K-Means
Examples
Figure : Image color reduction -
http://opencv-python-tutroals.readthedocs.org/en/latest/
_images/oc_color_quantization.jpg
References and Resources
1 Coursera Machine Learning
https://www.coursera.org/learn/machine-learning
2 Naive Bayes Classifier - Wikipedia
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
3 K-Means Clustering - Wikipedia
https://en.wikipedia.org/wiki/K-means_clustering
4 Visualizing K-Means Clustering
http://www.naftaliharris.com/blog/visualizing-k-means-clustering/
5 Naive Bayes for Image Processing
http://www.cs.ubc.ca/~lowe/papers/12mccannCVPR.pdf
6 Document Clustering with K-Means
http://www.codeproject.com/Articles/439890/
Text-Documents-Clustering-using-K-Means-Algorithm

Mais conteúdo relacionado

Mais procurados

Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…Dongseo University
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing mapsraphaelkiminya
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University
 
Image recogonization
Image recogonizationImage recogonization
Image recogonizationSANTOSH RATH
 
Implementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmImplementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmDipesh Shome
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...Shuhei Yoshida
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 

Mais procurados (20)

Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Image recogonization
Image recogonizationImage recogonization
Image recogonization
 
Implementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmImplementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor Algorithm
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 

Semelhante a Machine Learning Basics

Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clusteringmonalisa Das
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...MostafaHazemMostafaa
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analyticsCollin Bennett
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Algorithm chapter 1
Algorithm chapter 1Algorithm chapter 1
Algorithm chapter 1chidabdu
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxrinehi3578
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 

Semelhante a Machine Learning Basics (20)

Lect4
Lect4Lect4
Lect4
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Data clustering
Data clustering Data clustering
Data clustering
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Algorithm chapter 1
Algorithm chapter 1Algorithm chapter 1
Algorithm chapter 1
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 

Mais de Humberto Marchezi

Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesHumberto Marchezi
 
Building Anomaly Detections Systems
Building Anomaly Detections SystemsBuilding Anomaly Detections Systems
Building Anomaly Detections SystemsHumberto Marchezi
 
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...Humberto Marchezi
 
C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkHumberto Marchezi
 

Mais de Humberto Marchezi (6)

Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Building Anomaly Detections Systems
Building Anomaly Detections SystemsBuilding Anomaly Detections Systems
Building Anomaly Detections Systems
 
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
 
C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing Framework
 
NHibernate
NHibernateNHibernate
NHibernate
 
Padroes de desenho
Padroes de desenhoPadroes de desenho
Padroes de desenho
 

Último

Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 

Último (20)

Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 

Machine Learning Basics

  • 1. Machine Learning Basics Classification and Clustering Humberto Marchezi hcmarchezi@gmail.com November 2015
  • 2. Definitions Pattern recognition, artificial intelligence and a bit of data mining Solves a given task without explicitly being programmed to do so instead it makes predictions from provided data Machine learning algorithms can be divided into 3 categories: Supervised learning Unsupervised learning Reinforcement learning Problem types Classification Regression Clustering etc.
  • 3. Algorithms Supervised Learning Naive Bayesian Classifier Linear/Polynomial/Logistic/Multinomial Regression Neural Networks etc. Unsupervised Learning K-means / K-medoids Principal Component Analysis Gaussian Distribution (Anomaly Detection) etc.
  • 4. Naive Bayes Classifier Classify information based on probabilistic model score Score for a category ck with features f1, f2, f3, ..., fn p(Ck|f1, f2, ..., fn) = P(Ck )p(f1|Ck )p(f2|Ck )...p(fn|Ck ) p(f1)p(f2)...p(fn) For a text classifier, features above are each word in the sentence (bag-of-words model) Also known as multinomial naive bayes classifier
  • 5. Naive Bayes Classifier Concrete Example Ingredients 2 tbsp salt lemon Instructions Cut lemon Pour salt
  • 6. Naive Bayes Classifier Concrete Example Ingredients word occurrences 2 1 tbsp 1 salt 1 lemon 1 total 4 examples 2 Instructions word occurrences cut 1 lemon 1 pour 1 salt 1 total 4 examples 2 Global word occurrences 2 1 tbsp 1 salt 2 lemon 2 cut 1 pour 1 total 8 examples 4
  • 7. Naive Bayes Classifier Concrete Example Ingredients 1/2 word probability 2 1/4 tbsp 1/4 salt 1/4 lemon 1/4 Instructions 1/2 word probability cut 1/4 lemon 1/4 pour 1/4 salt 1/4 Global word probability 2 1/8 tbsp 1/8 salt 2/8 lemon 2/8 cut 1/8 pour 1/8
  • 8. Naive Bayes Classifier Concrete Example Query ’1 tbsp salt’ Ingredients (I) p(I| 1 , tbsp , salt ) = P(I)p( 1 |I)p( tbsp |I)p( salt |I) p( 1 )p( tbsp )p( salt ) = 0.5x0.0001x0.25x0.25 0.0001x0.125x0.25 = 1 Instructions (D) p(D| 1 , tbsp , salt ) = P(D)p( 1 |D)p( tbsp |D)p( salt |D) p( 1 )p( tbsp )p( salt ) = 0.5x0.0001x0.0001x0.25 0.0001x0.125x0.25 = 0.0004 Result: Ingredients (since it has the highest probability) Note: 0.0001 is the probability of an unknown element (cannot be zero!)
  • 9. Naive Bayes Classifier Examples Classify email as spam or not spam Document type classification Document sections classification Image Classification
  • 10. K-Means Unsupervised learning algorithm to identify clusters Find clusters for unlabeled data Algorithm k-means Choose K examples as initial centroids While centroids move 1) Choose closest centroid Ki for each xi and store distance ci 2) Calculate new centroid Ki in each cluster end
  • 11. K-Means K-means example steps to converge to final solution Figure : Taken from https://en.wikipedia.org/wiki/File: K_Means_Example_Step_2.svg
  • 12. K-Means How to avoid sub-optimal results ? Figure : Generated from http://www.naftaliharris.com/blog/ visualizing-k-means-clustering/
  • 13. K-Means How to avoid sub-optimal results ? k-means Repeat N times do Randomly choose K examples as initial centroids While centroids move 1) Choose closest centroid Ki for each xi and store distance ci 2) Calculate new centroid Ki in each cluster end Calculate result cost (average distance of examples to its centroids) If result cost is lower end (repeat)
  • 14. K-Means Elbow Method - How to identify the number of clusters ? Figure : K-means elbow method
  • 15. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=1
  • 16. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=2
  • 17. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=3
  • 18. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=4
  • 19. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=5
  • 20. K-Means Elbow Method - How to identify the number of clusters ? Figure : Cluster costs
  • 21. K-Means Elbow Method - How to identify the number of clusters ? Elbow method Repeat for clusters K = 1,2,3,...n Run K-Means Compute average cost for K clusters n i=1 ci n (simplifying n i=1 ci ) end (repeat) Plot cost for each K and choose the one located at the ”elbow”
  • 22. K-Means Elbow Method - How to identify the number of clusters ? Figure : K-means elbow method
  • 23. K-Means Elbow Method - How to identify the number of clusters ? Figure : K-means elbow method Not always possible to find elbow (well distributes examples) Best practice associate cluster number with business meaning
  • 24. K-Means Examples Figure : Customer segmentation with k-means
  • 25. K-Means Examples Figure : Identify related news and articles
  • 26. K-Means Examples Figure : Image color reduction - http://opencv-python-tutroals.readthedocs.org/en/latest/ _images/oc_color_quantization.jpg
  • 27. References and Resources 1 Coursera Machine Learning https://www.coursera.org/learn/machine-learning 2 Naive Bayes Classifier - Wikipedia https://en.wikipedia.org/wiki/Naive_Bayes_classifier 3 K-Means Clustering - Wikipedia https://en.wikipedia.org/wiki/K-means_clustering 4 Visualizing K-Means Clustering http://www.naftaliharris.com/blog/visualizing-k-means-clustering/ 5 Naive Bayes for Image Processing http://www.cs.ubc.ca/~lowe/papers/12mccannCVPR.pdf 6 Document Clustering with K-Means http://www.codeproject.com/Articles/439890/ Text-Documents-Clustering-using-K-Means-Algorithm