SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
K‑Means Clustering
K‑Means Clustering
클러스터내부에속한데이터들의거리를비교
가장가까운내부거리를갖는클러스터를찾기
Input: K(클러스터수), D(데이터집합)
Output: K개의클러스터
데이터집합에서K개의데이터를임의로추출(Random Initialize) 하여
각 클러스터의중심(Centroid)으로설정
각 데이터들에대해K개의중심점과의거리를구하고 가장가까운중심점
에클러스터할당
할당된클러스터를기준으로중심점을다시계산(평균 값 계산)
앞서진행한과정을반복
데이터의클러스터가 바뀌지않거나지정한반복수를채우면학습종료
K‑Means Clustering
거리(유사도) 측정방법?
초기 클러스터의중심점을설정하는방식?
K의개수를몇으로설정해야하는가?
클러스터링이얼마나잘되었는지평가를어떻게?
Distance Metrics
Euclidean Distance (L2) / Manhattan Distance (L1)
Distance Metrics
일반적으로Euclidean 공간에있다고 가정, L2 Distance 사용
Robustness: L1 > L2
K‑Means가 outlier에취약하기 때문에median을찾는방법
(Robustness 개선)
K‑Median 알고리즘은L1 Distance를사용
Initial Centroid
초기 값 위치에따라원하는결과가 나오지않을수있음
설정하는방식이중요
 Random 방식은local optima에빠질위험이있음(여러번돌리자)
대부분구현체는K‑Means++ Algorithm을기본으로적용하고 있음
k‑means++: The Advantages of Careful Seeding
Random initial centroid at first
Calculate distance, D(x)
Choose next centroid from D(x)^2
centroid가 밀집되지않도록outlier는피하면서k개의초기 값을결정
Choosing the value of K
Choosing the value of K
Evaluation
Internal Evaluation
결과를보고 클러스터간 유사도를기준으로평가
서로멀리떨어져있고, 내부에서는가까이있는것이높은점수
Davies‑Bouldin Index, Dunn Index
External Evaluation
클러스터링에사용되지않은데이터로평가
TRUE에해당하는정답셋을평가 기준으로삼아정확도를평가
Rand Measure, F‑Measure, Jaccard Index
Spark 2.3 ClusteringEvaluator
https://issues.apache.org/jira/browse/SPARK‑14516
Spark 2.3 버전에새롭게 추가된Evaluator
현재는 Silhouette Metric 만지원

Mais conteúdo relacionado

Mais procurados

From maching learning to deep learning episode2
From maching learning to deep learning episode2 From maching learning to deep learning episode2
From maching learning to deep learning episode2 Yongdae Kim
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Mad Scientists
 
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01Kwang Woo NAM
 
집단지성 프로그래밍 06-의사결정트리-01
집단지성 프로그래밍 06-의사결정트리-01집단지성 프로그래밍 06-의사결정트리-01
집단지성 프로그래밍 06-의사결정트리-01Kwang Woo NAM
 
Dsh data sensitive hashing for high dimensional k-nn search
Dsh  data sensitive hashing for high dimensional k-nn searchDsh  data sensitive hashing for high dimensional k-nn search
Dsh data sensitive hashing for high dimensional k-nn searchWooSung Choi
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropoutWuhyun Rico Shin
 
Transliteration English to Korean
Transliteration English to KoreanTransliteration English to Korean
Transliteration English to KoreanHyunwoo Kim
 
알고리즘 Ppt 04
알고리즘 Ppt 04알고리즘 Ppt 04
알고리즘 Ppt 04YERINJJO
 
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개Terry Cho
 
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석Kyunghoon Kim
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명홍배 김
 
[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요jaypi Ko
 
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5SANG WON PARK
 
Lecture 2: Supervised Learning
Lecture 2: Supervised LearningLecture 2: Supervised Learning
Lecture 2: Supervised LearningSang Jun Lee
 
머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian ProcessesJungkyu Lee
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization홍배 김
 

Mais procurados (18)

From maching learning to deep learning episode2
From maching learning to deep learning episode2 From maching learning to deep learning episode2
From maching learning to deep learning episode2
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
 
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
 
집단지성 프로그래밍 06-의사결정트리-01
집단지성 프로그래밍 06-의사결정트리-01집단지성 프로그래밍 06-의사결정트리-01
집단지성 프로그래밍 06-의사결정트리-01
 
Dsh data sensitive hashing for high dimensional k-nn search
Dsh  data sensitive hashing for high dimensional k-nn searchDsh  data sensitive hashing for high dimensional k-nn search
Dsh data sensitive hashing for high dimensional k-nn search
 
01. introduction
01. introduction01. introduction
01. introduction
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout
 
Transliteration English to Korean
Transliteration English to KoreanTransliteration English to Korean
Transliteration English to Korean
 
알고리즘 Ppt 04
알고리즘 Ppt 04알고리즘 Ppt 04
알고리즘 Ppt 04
 
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
 
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요
 
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
 
Lecture 2: Supervised Learning
Lecture 2: Supervised LearningLecture 2: Supervised Learning
Lecture 2: Supervised Learning
 
머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization
 

Mais de Junyoung Park

Quantitive Algorithm Trading
Quantitive Algorithm TradingQuantitive Algorithm Trading
Quantitive Algorithm TradingJunyoung Park
 
Common Design for Distributed Machine Learning
Common Design for Distributed Machine LearningCommon Design for Distributed Machine Learning
Common Design for Distributed Machine LearningJunyoung Park
 
Kaggle KKBox Churn Prediction
Kaggle KKBox Churn PredictionKaggle KKBox Churn Prediction
Kaggle KKBox Churn PredictionJunyoung Park
 
Cloudera & Zookeeper
Cloudera & ZookeeperCloudera & Zookeeper
Cloudera & ZookeeperJunyoung Park
 
한국어 자연어처리 101
한국어 자연어처리 101한국어 자연어처리 101
한국어 자연어처리 101Junyoung Park
 
Continuous Integration with Gitlab
Continuous Integration with GitlabContinuous Integration with Gitlab
Continuous Integration with GitlabJunyoung Park
 
Python Testing for Flask
Python Testing for FlaskPython Testing for Flask
Python Testing for FlaskJunyoung Park
 
News clustering and Recommendation system using Word Embedding
News clustering and Recommendation system using Word EmbeddingNews clustering and Recommendation system using Word Embedding
News clustering and Recommendation system using Word EmbeddingJunyoung Park
 
About Neural Network
About Neural NetworkAbout Neural Network
About Neural NetworkJunyoung Park
 

Mais de Junyoung Park (13)

Quantitive Algorithm Trading
Quantitive Algorithm TradingQuantitive Algorithm Trading
Quantitive Algorithm Trading
 
Common Design for Distributed Machine Learning
Common Design for Distributed Machine LearningCommon Design for Distributed Machine Learning
Common Design for Distributed Machine Learning
 
AWS EMR + Spark ML
AWS EMR + Spark MLAWS EMR + Spark ML
AWS EMR + Spark ML
 
Kaggle KKBox Churn Prediction
Kaggle KKBox Churn PredictionKaggle KKBox Churn Prediction
Kaggle KKBox Churn Prediction
 
Spark config
Spark configSpark config
Spark config
 
Cloudera & Zookeeper
Cloudera & ZookeeperCloudera & Zookeeper
Cloudera & Zookeeper
 
한국어 자연어처리 101
한국어 자연어처리 101한국어 자연어처리 101
한국어 자연어처리 101
 
Continuous Integration with Gitlab
Continuous Integration with GitlabContinuous Integration with Gitlab
Continuous Integration with Gitlab
 
Docker Intro
Docker IntroDocker Intro
Docker Intro
 
Python Testing for Flask
Python Testing for FlaskPython Testing for Flask
Python Testing for Flask
 
News clustering and Recommendation system using Word Embedding
News clustering and Recommendation system using Word EmbeddingNews clustering and Recommendation system using Word Embedding
News clustering and Recommendation system using Word Embedding
 
About Neural Network
About Neural NetworkAbout Neural Network
About Neural Network
 
About SVM
About SVMAbout SVM
About SVM
 

K-Means Clustering