5. The dimension of each embedding: D The number of embeddings: L
D L
Higher than our SSR with
48 models ensemble
6. Discussion1:
Do we really
need attribute
to enhance feature
learning?
Samples within a meta class can
be viewed as sharing a latent
attribute. So meta classes
corresponds to randomized
attributes
7. Discussion2:
In hidden layers, we may expect some clusters within the dataset.
A cluster may be viewed as a meta class.
employing meta class = enforcing diversity of clustering?
Discussion3:
Encoding the original one-hot label into a sequential label.
Using L-2 loss (or KLDiv loss, etc.) for learning the embedding brings about a similar
improvement?