SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Parallel K-Means
ncku Tien-Yang Wu
outline
Introduction
K-Means Algorithm
Parallel K-Means Based on MapReduce
Experimental Results
K-Means on spark
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
dataset oriented parallel clustering algorithms should be
developed.
K-Means Algorithm
K-Means Algorithm
Firstly, it randomly selects k objects from the whole objects
which represent initial cluster centers.
K-Means Algorithm
Each remaining object is assigned to the cluster to which it
is the most similar, based on the distance between the
object and the cluster center.
K-Means Algorithm
The new mean for each cluster is then calculated. This
process iterates until the criterion function converges.
Parallel K-Means Based
on MapReduce
most intensive calculation to occur is the calculation of
distances.
each iteration require nk distance
Parallel K-Means Based
on MapReduce
the distance computations between one object with the
centers is irrelevant to the distance computations
between other objects with the corresponding centers.
distance computations between different objects with
centers can be parallel executed.
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
data target
1,1
2,2
3,3
11,11
12,12
13,13
1 class
2 class
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
random two centroid
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
store two nodes
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
combine
combine
reduce
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
3,3
c1:(1,1)
c2:(11,11)
assign to c1(1,1)
(1,1) , {(3,3),(3,3)}
key value
output<key,value>
Parallel K-Means Based
on MapReduce
(1,1) , {(3,3),(3,3)}
key value
centroid
temporary to calculate new centroid, the object
(1,1)
{(3,3),(3,3)}
output<key,value>
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
c1:(1,1)
c2:(11,11)
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1 c1:(1,1)
c2:(11,11)
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
Parallel K-Means Based
on MapReduce
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
(1,1) , {(4,4),{(1,1),(3,3),2}
(11,11) , {(12,12),(12,12),1}
key value
same key combine
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
output<key,value>
centroid
temporary to calculate new centroid, the objects
,number of objects
(1,1)
{(4,4),{(1,1),(3,3)},2}
Parallel K-Means Based
on MapReduce
combine
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
combine
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
same key reduce
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
reduce
same key reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(4+2)/(2+1) ,(4+2)/(2+1) = 2,2
2,2 = new centroid
1,1
2,2
3,3
centroid is 2,2
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
centroid
new centroid, the objects
,new cluster
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(11,11) , {(12,12),{(11,11),(12,12),(13,13)}
update new centroid and next iteration
until converge or arrive to iteration number
Experimental Results
two 2.8 GHz cores and 4GB of memory
Experimental Results
Speedup
non linear,communication cost
Experimental Results
Scale up
performance
datasets 1GB 2GB 3GB 4GB
K-Means on spark
Reference
Parallel K-Means Clustering Based on MapReduce
Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1
2009
K means algorithm

Mais conteúdo relacionado

Mais procurados

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitEric Wendelin
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real Worldahmad bassiouny
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
strassen matrix multiplication algorithm
strassen matrix multiplication algorithmstrassen matrix multiplication algorithm
strassen matrix multiplication algorithmevil eye
 
Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplicationKumar
 
K means clustering
K means clusteringK means clustering
K means clusteringKuppusamy P
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and boundAbhishek Singh
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Artificial Intelligence Game Search by Examples
Artificial Intelligence Game Search by ExamplesArtificial Intelligence Game Search by Examples
Artificial Intelligence Game Search by ExamplesAhmed Gad
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
K means clustering
K means clusteringK means clustering
K means clusteringAhmedasbasb
 

Mais procurados (20)

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Clustering
ClusteringClustering
Clustering
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
strassen matrix multiplication algorithm
strassen matrix multiplication algorithmstrassen matrix multiplication algorithm
strassen matrix multiplication algorithm
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplication
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and bound
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Artificial Intelligence Game Search by Examples
Artificial Intelligence Game Search by ExamplesArtificial Intelligence Game Search by Examples
Artificial Intelligence Game Search by Examples
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
K means clustering
K means clusteringK means clustering
K means clustering
 

Destaque

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringGeorge Ang
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clusteringmobius.cn
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreducemakoto onizuka
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopTien-Yang (Aiden) Wu
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticALTIC Altic
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2Tianwei Liu
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 

Destaque (20)

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means Clustering
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
MachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_SparkMachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_Spark
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, altic
 
K means
K meansK means
K means
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 

Semelhante a Parallel-kmeans

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationSangmin Woo
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemSeongcheol Baek
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4Andrea Antonello
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...IJECEIAES
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space dataijcga
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Ismael Torres-Pizarro, PhD, PE, Esq.
 

Semelhante a Parallel-kmeans (20)

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Knn solution
Knn solutionKnn solution
Knn solution
 
M0174491101
M0174491101M0174491101
M0174491101
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space data
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 

Mais de Tien-Yang (Aiden) Wu

Mais de Tien-Yang (Aiden) Wu (11)

Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
RDD
RDDRDD
RDD
 
Semantic ui教學
Semantic ui教學Semantic ui教學
Semantic ui教學
 
響應式網頁教學
響應式網頁教學響應式網頁教學
響應式網頁教學
 
NoSQL & JSON
NoSQL & JSONNoSQL & JSON
NoSQL & JSON
 
Weebly上手教學
Weebly上手教學Weebly上手教學
Weebly上手教學
 
簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Último (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Parallel-kmeans

  • 2. outline Introduction K-Means Algorithm Parallel K-Means Based on MapReduce Experimental Results K-Means on spark
  • 3. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models.
  • 4. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models. dataset oriented parallel clustering algorithms should be developed.
  • 6. K-Means Algorithm Firstly, it randomly selects k objects from the whole objects which represent initial cluster centers.
  • 7. K-Means Algorithm Each remaining object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster center.
  • 8. K-Means Algorithm The new mean for each cluster is then calculated. This process iterates until the criterion function converges.
  • 9. Parallel K-Means Based on MapReduce most intensive calculation to occur is the calculation of distances. each iteration require nk distance
  • 10. Parallel K-Means Based on MapReduce the distance computations between one object with the centers is irrelevant to the distance computations between other objects with the corresponding centers. distance computations between different objects with centers can be parallel executed.
  • 11. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 data target 1,1 2,2 3,3 11,11 12,12 13,13 1 class 2 class
  • 12. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 random two centroid c1:(1,1) c2:(11,11)
  • 13. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 store two nodes c1:(1,1) c2:(11,11)
  • 14. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11)
  • 15. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map combine combine reduce
  • 16. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map 3,3 c1:(1,1) c2:(11,11) assign to c1(1,1) (1,1) , {(3,3),(3,3)} key value output<key,value>
  • 17. Parallel K-Means Based on MapReduce (1,1) , {(3,3),(3,3)} key value centroid temporary to calculate new centroid, the object (1,1) {(3,3),(3,3)} output<key,value>
  • 18. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map c1:(1,1) c2:(11,11) (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value
  • 19. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)}
  • 20. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 c1:(1,1) c2:(11,11) map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine
  • 21. Parallel K-Means Based on MapReduce (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine (1,1) , {(4,4),{(1,1),(3,3),2} (11,11) , {(12,12),(12,12),1} key value same key combine
  • 22. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value output<key,value> centroid temporary to calculate new centroid, the objects ,number of objects (1,1) {(4,4),{(1,1),(3,3)},2}
  • 23. Parallel K-Means Based on MapReduce combine (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)} combine (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value
  • 24. Parallel K-Means Based on MapReduce reduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value same key reduce
  • 25. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} reduce same key reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)}
  • 26. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (4+2)/(2+1) ,(4+2)/(2+1) = 2,2 2,2 = new centroid 1,1 2,2 3,3 centroid is 2,2
  • 27. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} centroid new centroid, the objects ,new cluster
  • 28. Parallel K-Means Based on MapReduce reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (11,11) , {(12,12),{(11,11),(12,12),(13,13)} update new centroid and next iteration until converge or arrive to iteration number
  • 29. Experimental Results two 2.8 GHz cores and 4GB of memory
  • 33. Reference Parallel K-Means Clustering Based on MapReduce Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1 2009 K means algorithm