O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm | Edureka

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
k-means clustering

Vídeos do YouTube não são mais aceitos pelo SlideShare

Visualizar original no YouTube

www.edureka.co/data-scienceEdureka’s Data Science Certification Training
What Will You Learn Today?
Cluster analysisIntrod...

Confira estes a seguir

1 de 34 Anúncio

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm | Edureka

This Edureka k-means clustering algorithm tutorial will take you through the machine learning introduction, cluster analysis, types of clustering algorithms, k-means clustering, how it works along with an example/ demo in R. This Data Science with R tutorial is ideal for beginners to learn how k-means clustering work. You can also read the blog here: https://goo.gl/3aseSs

This Edureka k-means clustering algorithm tutorial will take you through the machine learning introduction, cluster analysis, types of clustering algorithms, k-means clustering, how it works along with an example/ demo in R. This Data Science with R tutorial is ideal for beginners to learn how k-means clustering work. You can also read the blog here: https://goo.gl/3aseSs

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm | Edureka (20)

Anúncio

Mais de Edureka! (20)

Mais recentes (20)

Anúncio

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm | Edureka

  1. 1. www.edureka.co/data-scienceEdureka’s Data Science Certification Training k-means clustering
  2. 2. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Will You Learn Today? Cluster analysisIntroduction to Machine Learning Types of clustering Introduction to k- means clustering How k-means clustering work? Demo in R: Netflix use-case 1 2 3 4 65
  3. 3. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What is Machine learning? Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Training Data Learn Algorithm Build Model Perform Feedback
  4. 4. www.edureka.co/data-scienceEdureka’s Data Science Certification Training ML Use Case – Google self driving car  Google self driving car is a smart, driverless car.  It collects data from environment through sensors.  Takes decisions like when to speed up, when to speed down, when to overtake and when to turn.
  5. 5. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types of Machine Learning Supervised learning Unsupervised learning Feed the classifier with training data set and predefined labels. It will learn to categorize particular data under a specific label. When and where should I buy a house? House features Area crime rate Bedrooms Distance to HQ Area (in sq.ft) Locality
  6. 6. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types of Machine Learning Supervised learning Unsupervised learning An image of fruits is first fed into the system. The system identifies different fruits using features like color, size and it categorizes them. When a new fruit is shown, it analyses its features and puts it into the category having similar featured items.
  7. 7. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Cluster Analysis Unsupervised Learning
  8. 8. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What is Clustering? Clustering means grouping of objects based on the information found in the data describing the objects or their relationship.  The goal is that objects in one group should be similar to each other but different from objects in another group.  It deals with finding a structure in a collection of unlabeled data. Some Examples of clustering methods are:  K-means Clustering  Fuzzy/ C-means Clustering  Hierarchical Clustering
  9. 9. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Clustering Use Cases Marketing Seismic studiesLand use Insurance Marketing Discovering distinct groups in customer databases, such as customers who make lot of long-distance calls. Insurance Identifying groups of crop insurance policy holders with a high average claim rate. Farmers crash crops, when it is “profitable”. Land use Identification of areas of similar land use in a GIS database. Seismic studies Identifying probable areas for oil/gas exploration based on seismic data Use-cases
  10. 10. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types of clustering
  11. 11. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types of Clustering Exclusive Clustering • An item belongs exclusively to one cluster, not several. • K-means does this sort of exclusive clustering. • An item can belong to multiple clusters • Its degree of association with each cluster is known • Fuzzy/ C-means does this sort of exclusive clustering. Overlapping Clustering Hierarchial Clustering • When two cluster have a parent- child relationship or a tree-like structure then it is Hierarchical clustering Cluster 1 Cluster 2 Cluster 0 Cluster 2 Cluster 1
  12. 12. www.edureka.co/data-scienceEdureka’s Data Science Certification Training K-means clustering
  13. 13. www.edureka.co/data-scienceEdureka’s Data Science Certification Training K-means clustering k-means clustering k-means clustering is one of the simplest algorithms which uses unsupervised learning method to solve known clustering issues. Divides entire dataset into k clusters. k-means clustering require following two inputs. 1. K = number of clusters 2. Training set(m) = {x1, x2, x3,......, xm} Total population Group 2 Group 3Group 1 Group 4
  14. 14. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Example - Google News Various news URLs related to Trump and Modi are grouped under one section. K-means clustering automatically clusters new stories about the same topic into pre-defined clusters.
  15. 15. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Example I need to find specific locations to build schools in this area so that the students doesn’t have to travel much The plot of students in an area is as given below,
  16. 16. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Example - Solution This looks good
  17. 17. www.edureka.co/data-scienceEdureka’s Data Science Certification Training But how did he do that?... I’ll show you how
  18. 18. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work?
  19. 19. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work? Choose number of clusters Initialization Cluster assignment Move centroid Optimization Convergence The WSS is defined as the sum of the squared distance between each member of the cluster and its centroid. Mathematically: where, p(i)= data point q(i)= closest centroid to data point The idea of the elbow method is to choose the k after which the WSS decrease is almost constant.
  20. 20. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work? Choose number of clusters Initialization Cluster assignment Move centroid Optimization Convergence Cluster centroid X-axis Y-axis Randomly initialize k points called the cluster centroids. Here, k = 2 Value of k(number of clusters) can be determined by the elbow curve.
  21. 21. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work? Choose number of clusters Initialization Cluster assignment Move centroid Optimization Convergence  Compute the distance between the data points and the cluster centroid initialized.  Depending upon the minimum distance, data points are divided into two groups.
  22. 22. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work? Choose number of clusters Initialization Cluster assignment Move centroid Optimization Convergence  Compute the mean of blue dots.  Reposition blue cluster centroid to this mean.  Compute the mean of orange dots.  Reposition orange cluster centroid to this mean.
  23. 23. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work? Choose number of clusters Initialization Cluster assignment Move centroid Optimization Convergence Repeat previous two steps iteratively till the cluster centroids stop changing their positions.
  24. 24. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How k-means work? Choose number of clusters Initialization Cluster assignment Move centroid Optimization Convergence  Finally, k-means clustering algorithm converges.  Divides the data points into two clusters clearly visible in orange and blue.
  25. 25. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Problem Statement Challenge: Netflix wanted to increase its business by showing most popular movies on its website. Solution: So, Netflix decided to group the movies based on budget, gross and facebook likes Approach: For this, Netflix took imdb dataset of 5000 values and applied k-means clustering to group it. But how would I know which movie set to show and which to not ?
  26. 26. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo
  27. 27. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Solution – R Script
  28. 28. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Output  We got three clusters based on budget and gross.  Lets see how good are these clusters.  Using command cl gives following output. Within cluster sum of squares by cluster: (between_SS / total _ SS = 72.4 %)  Higher the %age value, better is the model.
  29. 29. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Further, lets relate cluster assignment to individual characteristics like director facebook likes(column 5) and movie facebook likes(column 28). Cluster 2 has maximum movie likes as well as director likes. Output
  30. 30. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Try this out I want to know the profit values of movie
  31. 31. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Hmm… I will go with cluster 2. It is making maximum profit and has maximum facebook likes.
  32. 32. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Course Details Go to www.edureka.co/data-science Get Edureka Certified in Data Science Today! What our learners have to say about us! Shravan Reddy says- “I would like to recommend any one who wants to be a Data Scientist just one place: Edureka. Explanations are clean, clear, easy to understand. Their support team works very well.. I took the Data Science course and I'm going to take Machine Learning with Mahout and then Big Data and Hadoop”. Gnana Sekhar says - “Edureka Data science course provided me a very good mixture of theoretical and practical training. LMS pre recorded sessions and assignments were very good as there is a lot of information in them that will help me in my job. Edureka is my teaching GURU now...Thanks EDUREKA.” Balu Samaga says - “It was a great experience to undergo and get certified in the Data Science course from Edureka. Quality of the training materials, assignments, project, support and other infrastructures are a top notch.”
  33. 33. www.edureka.co/data-scienceEdureka’s Data Science Certification Training

Notas do Editor

  • Add photos

×