Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Próximos SlideShares
Carregando em…5
×

# Parallel-kmeans

5.371 visualizações

Parallel k-means

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Entre para ver os comentários

• Seja a primeira pessoa a gostar disto

### Parallel-kmeans

1. 1. Parallel K-Means ncku Tien-Yang Wu
2. 2. outline Introduction K-Means Algorithm Parallel K-Means Based on MapReduce Experimental Results K-Means on spark
3. 3. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models.
4. 4. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models. dataset oriented parallel clustering algorithms should be developed.
5. 5. K-Means Algorithm
6. 6. K-Means Algorithm Firstly, it randomly selects k objects from the whole objects which represent initial cluster centers.
7. 7. K-Means Algorithm Each remaining object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster center.
8. 8. K-Means Algorithm The new mean for each cluster is then calculated. This process iterates until the criterion function converges.
9. 9. Parallel K-Means Based on MapReduce most intensive calculation to occur is the calculation of distances. each iteration require nk distance
10. 10. Parallel K-Means Based on MapReduce the distance computations between one object with the centers is irrelevant to the distance computations between other objects with the corresponding centers. distance computations between different objects with centers can be parallel executed.
11. 11. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 data target 1,1 2,2 3,3 11,11 12,12 13,13 1 class 2 class
12. 12. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 random two centroid c1:(1,1) c2:(11,11)
13. 13. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 store two nodes c1:(1,1) c2:(11,11)
14. 14. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11)
15. 15. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map combine combine reduce
16. 16. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map 3,3 c1:(1,1) c2:(11,11) assign to c1(1,1) (1,1) , {(3,3),(3,3)} key value output<key,value>
17. 17. Parallel K-Means Based on MapReduce (1,1) , {(3,3),(3,3)} key value centroid temporary to calculate new centroid, the object (1,1) {(3,3),(3,3)} output<key,value>
18. 18. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map c1:(1,1) c2:(11,11) (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value
19. 19. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)}
20. 20. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 c1:(1,1) c2:(11,11) map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine
21. 21. Parallel K-Means Based on MapReduce (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine (1,1) , {(4,4),{(1,1),(3,3),2} (11,11) , {(12,12),(12,12),1} key value same key combine
22. 22. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value output<key,value> centroid temporary to calculate new centroid, the objects ,number of objects (1,1) {(4,4),{(1,1),(3,3)},2}
23. 23. Parallel K-Means Based on MapReduce combine (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)} combine (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value
24. 24. Parallel K-Means Based on MapReduce reduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value same key reduce
25. 25. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} reduce same key reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)}
26. 26. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (4+2)/(2+1) ,(4+2)/(2+1) = 2,2 2,2 = new centroid 1,1 2,2 3,3 centroid is 2,2
27. 27. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} centroid new centroid, the objects ,new cluster
28. 28. Parallel K-Means Based on MapReduce reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (11,11) , {(12,12),{(11,11),(12,12),(13,13)} update new centroid and next iteration until converge or arrive to iteration number
29. 29. Experimental Results two 2.8 GHz cores and 4GB of memory
30. 30. Experimental Results Speedup non linear,communication cost
31. 31. Experimental Results Scale up performance datasets 1GB 2GB 3GB 4GB
32. 32. K-Means on spark
33. 33. Reference Parallel K-Means Clustering Based on MapReduce Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1 2009 K means algorithm