Presentation on how to chat with PDF using ChatGPT code interpreter
Fast and Probvably Seedings for k-Means
1. Fast and Provably Good Seedings for k-Means
O. Bachem, M. Lucic, S. Hassani, A. Krause
Presented by Kimikazu Kato,
Silver Egg Technology Co., Ltd.
2. Algorithm of k-Means clustering
Determine initial
centroids
Update centroids and
membership of clusters
gradually
Improvement of this part
Existing results:
k-means++:
sampling according to some metric
Bachem et al. 2016:
Performance improvement using
MCMC, but has some assumption about
the distribution of the data
Proposed:
Another MCMC based algorithm
without assumption of the distribution
Outline
3. Related researches
kmeans++
Draw
accoding to
Intuition:
Choose initial centroids from the
input data so that they scatter as
widely as possible
Bachem et al. 2016
Intended to overcome the
shortcoming of kmeans++: the
marginalization cost
Metropolitan Hastings algorithm,
which utilizes rejection sampling
to emulate the distribution.
But have some assumption on the
input data.
as a centroid
C: set of centroids which are
already chosen
8. Conclusion
• Novel algorithm for the initialization of
centroids in kmeans
• Theoretical guarantee on the convergence
and the trade-off of accuracy and speed
• Experimentally good result