This document discusses methods for one-shot learning using siamese neural networks. It provides an overview of several key papers in this area, including using siamese networks for signature verification (1993) and one-shot image recognition (2015), and introducing matching networks for one-shot learning (2016). Matching networks incorporate an attention mechanism into a neural network to rapidly learn from small datasets by matching training and test conditions. The document also reviews experiments demonstrating one-shot and few-shot learning on datasets like Omniglot using these siamese and matching network approaches.
2. Contents
Introduction of methods for one shot learning using siamese neural network.
• Signature Verification using a "Siamese” Time Delay Neural Network (1993), NIPS
• Siamese Neural Networks for One-shot Image Recognition (2015), ICML
• Matching Networks for One Shot Learning (2016), NIPS
Propose my idea for matching
2
3. History of One-shot Learning
Firstly proposed by Fei-Fei et al. (2003); Fei-Fei et al. (2006). They developed a
variational Bayesian framework.
Lake et al. (2013) proposed an algorithm with a method called Hierarchical
Bayesian Programming Learning.
Methods based metric learning were proposed (Koch et al. (2015); Vinyals et al.
(2016)).
Methods based neural network with memory were proposed (Graves et al. (2014);
Santoro et al. (2016)).
There exist some other general formulations and domain specific researches.
One-shot Object Detection was proposed in Schwartz et al. (2018)
3
4. Recent Methods for One-shot Learning
using Neural Networks
1. Metric Learning
2. Memory network
Papers:
1. Koch et al. (2015)
2. Graves et al. (2014)
1+2. Vinyals et al. (2016)
The siamese network is often used.
• Siamese nets were first introduces by Bromley et al.
(1993) to solve signature verification as an image
matching problem.
• Koch et al. (2015) proposed Deep Siamese Networks
for one-shot image recognition.
• Vinyals et al. (2016) proposed Matching Nets, which
is a model that incorporated memory network to
Deep Siamese Networks and formulated the task as
classification problem.
• Schwartz et al. (2018) applied existing methods for
One-shot Object Detection.
4
5. Siamese Network
Siamese network consists
of two identical sub-networks joined at their outputs.
Image A Image B
Layer Layer
Computes metric between A and B
5
6. More Detail of Basic Structure
Image A
Image B
Same structure and weights
6
7. Signature Verification
using a “Siamese” Time Delay
Neural Network
• The aim of the project was to make a signature verification system based on the
NCR 5990 Signature Capture Device.
• A signature is 800 sets of 𝑥, 𝑦 and pen up-down points with time 𝑡.
• Preprocess the data before training the network.
7
Bromley et al. (1993)
8. Performance
8
GA: Genuine signature pairs
• Correct pairs.
FR: Forgery
• Write to deceive.
Classified the signature and detect the
forgery with good performance.
9. Siamese Neural Networks
for On-shot Image Recognition
• Siamese nets were first introduces by Bromley et al. (1993) to solve signature
verification as an image matching problem.
• Koch et al. (2015) used convolutional deep neural network to extract features of
images before calculating its distance.
9
Koch et al. (2015)
10. • The model is a siamese convolutional network with 𝐿 layers each with 𝑁𝑙 units, where
ℎ1,𝑙 represents the hidden vector in layer 𝑙 for the first twin, and ℎ2,𝑙 denotes the same
for the second twin.
• ReLU units in the first 𝐿 − 2 layers and sigmoidal units in the remaining layers.
Distance metric
Image A
Image B
Deep Siamese Networks
10
11. Learning
𝑀: minibatch size.
𝑖: indexes the 𝑖the minibatch.
𝑦 𝑥1
𝑖
, 𝑥2
𝑖
: length- 𝑀 vector which contains the labels for the minibatch.
• 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
= 1 whenever 𝑥1 and 𝑥2 are from the same class.
• 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
= 0 otherwise.
Regularized cross-entropy objective on a binary classifier
ℒ 𝑥1
𝑖
, 𝑥2
𝑖
= 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
log 𝑝 𝑥1
𝑖
, 𝑥2
𝑖
+ 1 − 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
log(1 − 𝑝 𝑥1
𝑖
, 𝑥2
𝑖
) + 𝜆 𝑇
|𝑤|2
11
12. Dataset
Dataset: Omniglot
1623 characters from 50 different alphabets (40 train, 10 test).
Each of these was hand down by 20 different people.
The number of letters in each alphabet varies considerably from about 15 to
upwards of 40 characters.
12
13. N-way k-shot learning
This is a problem setting which is often used in one shot learning.
• Pick 𝑁 classes.
• Use 𝑘 training data.
13
14. Experiments
14
The number
of samples
Data augmentation
use 20 alphabet from 50
(except for previous 30 alphabet)
and 1 data from 20.
use 30 alphabet from 50 and 12
data from 20.
For fine tuning
15. Matching Networks
for One Shot Learning
Image A Image B
Layer Layer
Computes metric Classification
Layer
Image A
Image A
Image B
Image C
Attention
Matching Networks
for One Shot Learning
Siamese Neural Networks
for On-shot Image Recognition
15
Vinyals et al. (2016)
16. Concepts
➕ Excellent generalization.
➖ Learning is slow and based on large
datasets, requiring many weight updates
using SGD.
➕ Novel examples to be assimilated.
➖ Some models in this family do not
require any training but performance
depends on the chosen metric.
Incorporate the characteristics from both parametric and non-parametric models
Rapid acquisition of new examples while providing excellent generalization from common
examples.
Parametric models (Deep Learning) Non-parametric models
16
1. Propose Matching Nets, a neural network which uses recent advances in attention and
memory that enable rapid learning.
2. The training procedure is based on a simple machine learning principle: test and train
conditions must match.
17. Model Architecture
• A neural attention mechanism is defined to access a memory matrix which
stores useful information to solve the task at hand.
𝑘 examples of image-label pairs 𝑆 = {(𝑥𝑖, 𝑦𝑖)}𝑖=1
𝑘
.
A classifier 𝑐 𝑠( 𝑥) which defines a probability distribution over outputs 𝑦 given a
test example 𝑥.
Define the mapping 𝑺 → 𝒄 𝒔( 𝒙) to be 𝑷( 𝒚| 𝒙, 𝑺)
where 𝑃 is parametrized by a neural network
𝑷( 𝒚| 𝒙, 𝑺)
17
18. Model Architecture
• The model computes 𝑦 as follows:
𝑦 = ∑𝑖=1
𝑘
𝑎 𝑥, 𝑥𝑖 𝑦𝑖
where 𝑥𝑖, 𝑦𝑖 are the samples and labels from the support set 𝑆 = {(𝑥𝑖, 𝑦𝑖)}𝑖=1
𝑘
, and 𝑎 is
an attention mechanism which is discussed in the next slide.
If there is only one image, it is
one-shot learning.
𝑺 = {(𝒙𝒊, 𝒚𝒊)}𝒊=𝟏
𝒌
𝒙
𝒚
𝑎 𝑥, 𝑥𝑖
18
19. Formulation and Learning
The algorithm relies on choosing 𝑎 . , . , the attention mechanism.
The simplest form is to use softmax over the cosine distance 𝑐, i.e.,
𝑎 𝑥, 𝑥𝑖 = 𝑒 𝑐(𝑓 𝑥 ,𝑔 𝑥 𝑖 )/
𝑗=1
𝑘
𝑒 𝑐(𝑓 𝑥 ,𝑔 𝑥 𝑗 )
with embedding functions 𝑓 and 𝑔 being approximate neural networks to embed
𝑥 and 𝑥𝑖.
The Attention Kernel
19
20. 𝐿: Possible label sets
• 𝐿 could be the label set {𝑐𝑎𝑡𝑠, 𝑑𝑜𝑔𝑠}.
𝑇: Distribution over 𝐿.This is the train data.
1. Sample 𝐿 from 𝑇.
2. Sample 𝑆 and 𝐵 from 𝐿.
3. Minimize the error predicting the labels in the batch 𝐵 conditioned on the
support set 𝑆.
Definition
Learning Step
Objective Function
𝜃 = arg max 𝜃 𝔼 𝐿∼𝑇[𝔼 𝑆∼𝐿,𝐵∼𝐿[
𝑥,𝑦 ∈𝐵
log 𝑃 𝜃 𝑦 𝑥, 𝑆 ]]
Simulate the task of one shot learning only from train data.
20
21. Experiments
N-way k-shot learning
• Pick 𝑁 unseen character classes, independent of alphabet, as
𝐿.
• Provide the model with one drawing of each of the 𝑁
characters as 𝑆~𝐿 and a batch 𝐵~𝐿.
21
𝜃 = arg max 𝜃 𝔼 𝐿∼𝑇[𝔼 𝑆∼𝐿,𝐵∼𝐿[
𝑥,𝑦 ∈𝐵
log 𝑃 𝜃 𝑦 𝑥, 𝑆 ]]
Objective Function
22. Experiments
• Pixels: Nearest Neighbor.
• Baseline: Using features calculated with CNN, do Nearest Neighbor.
• Convolutional siamese net: “Siamese Neural Networks for One-shot
Image Recognition”.
22
The number of class