SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
LEARNING TO REMEMBER RARE EVENTS
Łukasz Kaiser+, Google Brain, '17
2017/3/21
Agenda
概要
モチベーション
手法
実験/結果
コメント
概要
概要
one‐shot learningの新たな手法を提案
life‐long and one‐shotなデータも上手く学習できる
記憶容量が大量になった場合でも高速に検索出来る
Nearest Neighbor﴾以下NN﴿の手法を提案
NN部分以外は微分可能で、かつ、end‐to‐endで学習可能
さまざまな教師あり学習手法に対して着脱可能な手法
以下2つのタスクで性能検証
image classification﴾Omniglot dataset﴿
datasetが小さいのでlong‐term memoryの効果はわか
りづらい
SoTA
large‐scale machine translation task﴾English‐German
translation﴿
モチベーション
モチベーション
life‐long and one‐shotなデータを含むdatasetに対する予測性
能の向上
手法
MEMORY MODULE
構成要素
以下の3つのパーツから構成される
memory keys
memory values
memory age
memory keys
memoryの本体
matrix
どこかの層の値をvectorizeしたもの﴾VGG16だったらfc7とか﴿
ある画像を入力した際のquery(fc7の値とか)との照合を行う
ためのデータ
memory values
vector
lavelの値
memory age
vector
itemの現在までの保存時間
学習
手順
1. query計算
2. kNNを計算しqueryに隣接しているmemory indexを取得
3. memory lossの計算 ‐> backward
4. memory update
Notation
M: memory module
K: keys matrix﴾memory‐size × key‐size﴿
V: values vector﴾memory‐size﴿
A: age vector﴾memory‐size﴿
q: query
NN: nearest neighbor
k: NNで抽出するデータの個数﴾= 256﴿
NN : k nearest neighbor
n: k nearest neighborで取得したindex
V[n]: index nのmemory value﴾Aも同様﴿
t: inverse of softmax temperature﴾= 40﴿
d: cosine similaritie
k
memoryの表現
K, V,Aのtripletで表現できる
M = (K , V , A )
値の正規化
query値は正規化
∣∣q∣∣ = 1
memory−size×key−size memory−size memory−size
距離の表現とか
cosine similarity
d = q ⋅ K[n ]
queryとmemoryのnearest neighbor﴾距離はcosine similarity﴿
NN(q, M) = argmax q ⋅ K[i]
k nearest neighbor
NN (q, M)
k nearest neighborでmemory indexの取得
(n , ..., n ) = NN (q, M)
embedded matrix﴾翻訳タスクで使用﴿
i i
i
k
1 k k
Memoryを用いた学習
Memory Loss
nbestの上からpositive index﴾ n ﴿とnegative index﴾n ﴿を1つず
つ取得
n と n におけるコサイン類似度を取得し差分を取る
mergin α = 0.1を課す
lossは以下のとおり
loss(q, v, M) = [q ⋅ K[n ] − q ⋅ K[n ] + α
p b
b p
b p
Memory Update
Case1:
best positive indexにおけるmemory valueがgtと同じ
q + k を正規化したものをK[n ]と置き換えるni i
Memory Update
Case2:
best positive indexにおけるmemory valueがgtと異なる
ageが最大のindexをランダムに取得して、key, value, ageを
update
keyはquery、valueはgt、ageは0とする
update対象のindex選択の方法を詳細に書くと
argmax A[i]+r where ∣r ∣ ≪ ∣M∣i i i
Efficient nearest neighbor computation
LSHみたいなhashing trickを使用
NNの計算を高速に行えるようにした
Memory moduleの使用用途
Memory moduleの使用用途
classification networkであればどんなものにでも付加できる
Memory moduleのTOP1 memory valueを最終的なoutputとす
るのではなく、埋め込み空間として考えて後段で使用するこ
とも出来る
Seq2Seqなどでこの方式を活用
Convolutional Network with Memory
本稿では以下の構成でネットワークを構築
最後のfc層をqueryとして使用
conv.﴾ch=64, k=3﴿ ‐> relu ‐> conv.﴾ch=64, k=3﴿ ‐> relu
‐> max‐pooling
‐> conv.﴾ch=128, k=3﴿ ‐> relu
‐> conv.﴾ch=128, k=3﴿ ‐> relu
‐> max‐pooling
‐> fc ‐> fc
Sequence‐to‐sequence with Memory
本稿ではGNMT modelの2つ目のLSTMの出力をqueryとして使
用
LSTMの出力と埋め込み空間を使った出力を結合、FC層を経由
してsoftmax層でpredictを行った
Extended Neural GPU with Memory
Decoder側のネットワーク内部でMemory Moduleを使用
(Output層付近ではない)
実験
共通の実験条件
k = 256
α = 0.1
Optimizer: Adam
評価
Omniglotでの評価
Omniglot dataset
50種類のアルファベットから1623文字で構成される
それぞれ20人の異なる人が手書き
https://github.com/brendenlake/omniglot
http://www.omniglot.com/
Omniglot dataset
class数が非常に多いにも関わらずデータ数は少ない(1classあ
たり20枚)
one‐shot learningの評価を行うのに理想的なdataset
評価結果
1,200文字を学習に使用
残りの423文字はテストで使用
augmentation: 90度回転
各classからN個の画像を選ぶ﴾N‐way﴿
K回学習に使用﴾k‐shot﴿した時の評価を行った
BNは使わず上で説明した単純なConvolutional Networkを使用
Synthetic task
Synthetic task
memory moduleの動きを理解するために評価を行った
使用したモデルはExtended Neural GPU﴾32 channels﴿
memory moduleを付加した場合と付加しない場合で評価を行
った
memoryサイズは50万
{2, . . . , 16000} の数値から一つ選びBase4で変換
{0,1,2,3}から構成される0埋め7桁の数字を作る
7桁の数字の配置位置をランダムに選び、それ以外の部分にAB
の文字列を配置
13文字の文字列を作成
訓練には4万個の文字列を使用
Synthetic task
16K個の要素を学習する必要があり通常のExtended Neural
GPUでは記憶しておくことが難しい
ランダム要素が強いのでrate eventも多発(Extended Neural
GPUでは対処が難しい)
Seq2Seq﴾2‐layer LSTM model with 256 units in each layer﴿でも
評価
Translation
Translation
GNMT modelを使用
評価に用いたのはWMT14 English‐to‐German翻訳タスク
質的評価と量的評価の両面を実施
質的な評価はDostoevskyのような出現頻度の低い単語が上手
く翻訳できているかを評価
量的はWMT test setで評価﴾BLEU score﴿

Mais conteúdo relacionado

Mais de shima o

[読会]Causal transfer random forest combining logged data and randomized expe...
[読会]Causal transfer random forest   combining logged data and randomized expe...[読会]Causal transfer random forest   combining logged data and randomized expe...
[読会]Causal transfer random forest combining logged data and randomized expe...shima o
 
[読会]P qk means-_ billion-scale clustering for product-quantized codes
[読会]P qk means-_ billion-scale clustering for product-quantized codes[読会]P qk means-_ billion-scale clustering for product-quantized codes
[読会]P qk means-_ billion-scale clustering for product-quantized codesshima o
 
[読会]Long tail learning via logit adjustment
[読会]Long tail learning via logit adjustment[読会]Long tail learning via logit adjustment
[読会]Long tail learning via logit adjustmentshima o
 
[読会]A critical review of lasso and its derivatives for variable selection und...
[読会]A critical review of lasso and its derivatives for variable selection und...[読会]A critical review of lasso and its derivatives for variable selection und...
[読会]A critical review of lasso and its derivatives for variable selection und...shima o
 
[読会]Themis decentralized and trustless ad platform with reporting integrity
[読会]Themis decentralized and trustless ad platform with reporting integrity[読会]Themis decentralized and trustless ad platform with reporting integrity
[読会]Themis decentralized and trustless ad platform with reporting integrityshima o
 
[読会]Logistic regression models for aggregated data
[読会]Logistic regression models for aggregated data[読会]Logistic regression models for aggregated data
[読会]Logistic regression models for aggregated datashima o
 
Introduction of introduction_to_group_theory
Introduction of introduction_to_group_theoryIntroduction of introduction_to_group_theory
Introduction of introduction_to_group_theoryshima o
 
Squeeze and-excitation networks
Squeeze and-excitation networksSqueeze and-excitation networks
Squeeze and-excitation networksshima o
 
Dynamic filters in graph convolutional network
Dynamic filters in graph convolutional networkDynamic filters in graph convolutional network
Dynamic filters in graph convolutional networkshima o
 
Nmp for quantum_chemistry
Nmp for  quantum_chemistryNmp for  quantum_chemistry
Nmp for quantum_chemistryshima o
 
連続最適化勉強会
連続最適化勉強会連続最適化勉強会
連続最適化勉強会shima o
 
ReviewNet_161122
ReviewNet_161122ReviewNet_161122
ReviewNet_161122shima o
 
finite time analysis of the multiarmed bandit problem
finite time analysis of the multiarmed bandit problemfinite time analysis of the multiarmed bandit problem
finite time analysis of the multiarmed bandit problemshima o
 
normalized online learning
normalized online learningnormalized online learning
normalized online learningshima o
 
logistic regression in rare events data
logistic regression in rare events datalogistic regression in rare events data
logistic regression in rare events datashima o
 
Joint optimization of bid and budget allocation in sponsored search
Joint optimization of bid and budget allocation in sponsored searchJoint optimization of bid and budget allocation in sponsored search
Joint optimization of bid and budget allocation in sponsored searchshima o
 
Towards a robust modeling of temporal interest change patterns for behavioral...
Towards a robust modeling of temporal interest change patterns for behavioral...Towards a robust modeling of temporal interest change patterns for behavioral...
Towards a robust modeling of temporal interest change patterns for behavioral...shima o
 
Real time bidding algorithms for performance-based display ad allocation
Real time bidding algorithms for performance-based display ad allocationReal time bidding algorithms for performance-based display ad allocation
Real time bidding algorithms for performance-based display ad allocationshima o
 
Fingind the right consumer - optimizing for conversion in display advertising...
Fingind the right consumer - optimizing for conversion in display advertising...Fingind the right consumer - optimizing for conversion in display advertising...
Fingind the right consumer - optimizing for conversion in display advertising...shima o
 
Real time bid optimization with smooth budget delivery in online advertising
Real time bid optimization with smooth budget delivery in online advertisingReal time bid optimization with smooth budget delivery in online advertising
Real time bid optimization with smooth budget delivery in online advertisingshima o
 

Mais de shima o (20)

[読会]Causal transfer random forest combining logged data and randomized expe...
[読会]Causal transfer random forest   combining logged data and randomized expe...[読会]Causal transfer random forest   combining logged data and randomized expe...
[読会]Causal transfer random forest combining logged data and randomized expe...
 
[読会]P qk means-_ billion-scale clustering for product-quantized codes
[読会]P qk means-_ billion-scale clustering for product-quantized codes[読会]P qk means-_ billion-scale clustering for product-quantized codes
[読会]P qk means-_ billion-scale clustering for product-quantized codes
 
[読会]Long tail learning via logit adjustment
[読会]Long tail learning via logit adjustment[読会]Long tail learning via logit adjustment
[読会]Long tail learning via logit adjustment
 
[読会]A critical review of lasso and its derivatives for variable selection und...
[読会]A critical review of lasso and its derivatives for variable selection und...[読会]A critical review of lasso and its derivatives for variable selection und...
[読会]A critical review of lasso and its derivatives for variable selection und...
 
[読会]Themis decentralized and trustless ad platform with reporting integrity
[読会]Themis decentralized and trustless ad platform with reporting integrity[読会]Themis decentralized and trustless ad platform with reporting integrity
[読会]Themis decentralized and trustless ad platform with reporting integrity
 
[読会]Logistic regression models for aggregated data
[読会]Logistic regression models for aggregated data[読会]Logistic regression models for aggregated data
[読会]Logistic regression models for aggregated data
 
Introduction of introduction_to_group_theory
Introduction of introduction_to_group_theoryIntroduction of introduction_to_group_theory
Introduction of introduction_to_group_theory
 
Squeeze and-excitation networks
Squeeze and-excitation networksSqueeze and-excitation networks
Squeeze and-excitation networks
 
Dynamic filters in graph convolutional network
Dynamic filters in graph convolutional networkDynamic filters in graph convolutional network
Dynamic filters in graph convolutional network
 
Nmp for quantum_chemistry
Nmp for  quantum_chemistryNmp for  quantum_chemistry
Nmp for quantum_chemistry
 
連続最適化勉強会
連続最適化勉強会連続最適化勉強会
連続最適化勉強会
 
ReviewNet_161122
ReviewNet_161122ReviewNet_161122
ReviewNet_161122
 
finite time analysis of the multiarmed bandit problem
finite time analysis of the multiarmed bandit problemfinite time analysis of the multiarmed bandit problem
finite time analysis of the multiarmed bandit problem
 
normalized online learning
normalized online learningnormalized online learning
normalized online learning
 
logistic regression in rare events data
logistic regression in rare events datalogistic regression in rare events data
logistic regression in rare events data
 
Joint optimization of bid and budget allocation in sponsored search
Joint optimization of bid and budget allocation in sponsored searchJoint optimization of bid and budget allocation in sponsored search
Joint optimization of bid and budget allocation in sponsored search
 
Towards a robust modeling of temporal interest change patterns for behavioral...
Towards a robust modeling of temporal interest change patterns for behavioral...Towards a robust modeling of temporal interest change patterns for behavioral...
Towards a robust modeling of temporal interest change patterns for behavioral...
 
Real time bidding algorithms for performance-based display ad allocation
Real time bidding algorithms for performance-based display ad allocationReal time bidding algorithms for performance-based display ad allocation
Real time bidding algorithms for performance-based display ad allocation
 
Fingind the right consumer - optimizing for conversion in display advertising...
Fingind the right consumer - optimizing for conversion in display advertising...Fingind the right consumer - optimizing for conversion in display advertising...
Fingind the right consumer - optimizing for conversion in display advertising...
 
Real time bid optimization with smooth budget delivery in online advertising
Real time bid optimization with smooth budget delivery in online advertisingReal time bid optimization with smooth budget delivery in online advertising
Real time bid optimization with smooth budget delivery in online advertising
 

Dl study g_learning_to_remember_rare_events