1. Metric Learning for Clustering
SCC5945 - Análise Semi-Supervisionada e Não-Supervisionada
de Padrões em Dados
(Seminar)
Sidgley Camargo de Andrade
PhD student in computer science
Institute of Computer Science and Mathematics
University of São Paulo
June 2016
1 / 12
3. Constraint-based algorithms
How to help the unsupervised algorithms to find better
solution?
Constraint-based methods– e.g. background knowledge
through pairwise constraints Wagstaff et al. (2001)
Con ⊆ DxD : must-link constraints
Con= ⊆ DxD : cannot-link constraints
Active- and self-learning
Other . . .
Are there “problems” related to algorithms above?
3 / 12
5. Metrics
The metrics depict the relationships between the data (e.g.
euclidean distance, mahalanobis distance, etc. . . )
What is the right metric?
There are few forms or systemic mechanisms to tweak distance
metrics, and them are often by hand Xing et al. (2003).
5 / 12
6. Metric learning for clustering
Assumption: keeping dissimilar points far from each other and
similar points closest to each other reduces the risk of errors.
Xing et al. (2003)
Suppose a user indicates that certain points in an input space (say,
n) are considered by them to be “similar” (or “dissimilar”). Can we
automatically learn a distance metric over n that respects these
relationships, i.e., one that assigns small distances between the
similar pairs and greater distances otherwise?
Learn a metric d : nx n → over the input space.
6 / 12
7. Problem
A simple way is to require that similar pairs (must-linked) have
small distance between them, whereas dissimilar pairs (cannot-link)
have greater distance between them
d(x, y) = dA(x, y) = ||x − y||A = (x − y)T A(x − y)
min
A
(xi ,xj )∈S ||xi − xj ||2
A
s.t. (xi ,xj )∈D ||xi − xj ||2
A ≥ c
A 0
, where A 0 is a constraint that symmetric matrix A must be
positive semi-definite – “pseudo metric” – and c any positive
constant ≥ 1
1
Question for class – Why is constant c positive?
2
Question for class – How to transform to max problem?
7 / 12
9. Metric Pairwise Constraint K-means
(MPCK-means)
Assumes a matrix Ah (metric) for each cluster h
Permits the specification of an individual weight for each constraint
(fM and fC ); the penalty for constraint violations is proportional to
the violated constraints weight
9 / 12
12. References
Basu, S., Davidson, I., and Wagstaff, K. (2008). Constrained Clustering:
Advances in Algorithms, Theory, and Applications. Chapman &
Hall/CRC, 1 edition.
Bilenko, M., Basu, S., and Mooney, R. J. (2004). Integrating constraints
and metric learning in semi-supervised clustering. In Proceedings of
the Twenty-first International Conference on Machine Learning, ICML
’04, pages 11–, New York, NY, USA. ACM.
Wagstaff, K., Cardie, C., Rogers, S., and Schrödl, S. (2001). Constrained
k-means clustering with background knowledge. In Proceedings of the
Eighteenth International Conference on Machine Learning, ICML ’01,
pages 577–584, San Francisco, CA, USA. Morgan Kaufmann
Publishers Inc.
Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. (2003). Distance
metric learning, with application to clustering with side-information. In
Advances in Neural Information Processing System, pages 505–512.
MIT Press.
12 / 12