【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)

Deep Learning JP
Deep Learning JPDeep Learning JP
DEEP LEARNING JP
[DL Papers]
言語以外でのTransformerのまとめ
(ViT, Perceiver, Frozen Pretrained Transformer etc)
発表者:岩澤有祐
http://deeplearning.jp/
発表概要
• 言語で主に使われてきたSelf Attention(Transformer)が,様々な領域
で使われつつある
• 画像における利用
– SASA(NeurIPS2019),SANs(CPVR2020),ViT(ICLR2021)など
– ViTの後継論文としてDeIT,T2Tなど
• より汎用なモジュールとしての可能性についての論文も
– Perceiver:様々なモダリティで汎用に使えるモデル
– Frozen Pretrained Transformer:言語 -> ほかモダリティへの転移
• 主に言語以外でのTransformerの活用についてのまとめ 2
目次
• 前提知識:Self Attention
• 画像におけるSelf Attention
• モダリティに汎用な(Self) Attention
– Perceiver
– Frozen Pretrained Transformer
3
前提知識:SELF ATTENTION
4
前提の前提:単語埋め込み
5
入力 “Attention is all you need”
Embedding
size 𝐷
Token size 𝐿
= 𝑋𝑇
Self Attention
6
𝑄 = 𝐾 = 𝑉 = 𝑋の場合がSelf Attention
行列サイズの可視化
7
Multi Head Self Attention
8
各ヘッドの処理(h個)
• Q, K, Vを適当な重みWで埋め込む
• Wは(トークンサイズ,ヘッドサイズ)の重み
全体をまとめる処理
• Concat + Linear
• 通常は入力と同じサイズになるようにする
Positional Encoding
9
• Self AttentionはPermutation Invariant
(位置情報が入らない)
• 明示的に埋め込んだものがPE
• PEの次元は入力と同じ
"Visual Guide toTransformer Neural Networks - (Episode 1)
Position Embeddings" (Youtube)より抜粋
Positional Encoding(縦が位置,横が埋め込みの長さ)
10
"Transformer Architecture:The Positional Encoding" (Blog)より抜粋
Transformer Encoder
11
Residual Connection + Layer Norm
• 入力を足す(MHSAの出力は入力と同じ大きさ)
• LayerNormなどによる正規化
(Position-wise) Feed Forward
• 各トークンごとにFeed Forwardに入力
• FFN(x) = σ(xW1 _ b1)W2 + b2
• W1の次元は(トークン長,隠れ層サイズ)
• Σは活性化関数(GELUなどがよく使われる)
• 最終的な出力は入力と同じサイズ
Multi Head Self Attention
• 前述の通り
補足(余談)
• Feed Forward層は何をしているのか?
– Key-Valueのニューラルメモリ機構になっている
– “Transformer Feed-Forward Layers Are Key-Value Memories”, 2020
(arXiv)
• ResidualとFeed Forwardは必須
– “Attention is Not All You Need: Pure Attention Loses Rank Doubly
Exponentially with Depth”, 2021 (arXiv)
• 細かい部分は重要か?(活性化関数とか)
– “Do Transformer Modifications Transfer Across Implementations and
Applications?” 2021 (arXiv)
12
本題1:画像におけるSELF ATTENTION
13
画像におけるSelf Attentionの大別
14
“Bottleneck Transformers for Visual Recognition”より抜粋
ConvとAttentionを
組み合わせ
基本的に
Attentionのみ
本資料のメイン
画像における単語(トークン)とは?
15
テキストの場合 画像の場合
• 単語1つが1単位として文章を表す • 画像の場合は???
??
主な方法1:ピクセルをトークンとみなす
16
• 全体を見ると計算量が爆発
Local Attention
17
“Stand-Alone Self Attention in Vision Models”より抜粋
• 𝐾, 𝑉を空間的に
近い部分のみに
• 計算量を削減
• SASAやSANsなど
畳み込みとLocal Attention
18
畳み込み Local Attention
観測値に依存しない重みをかける 観測値に依存した重みをかける
※理論的な関係は”On the Relationship between Self-Attention and Convolution Layers”など
(相対的PEを使う場合任意の畳込みをSelfAttentionは近似出来る)
その他の計算量削減方法:Axial Attention [Wang+2020]
19
Self Attentionを縦方向と横方向に分けることで計算量削減
主な方法2:パッチをトークンとみなす
20
ViT
iGPT
※図はそれぞれの論文より抜粋
ViTの全体像
21
ViTの補足
• PEには1次元の学習可能なパラメータを利用
– その他に2次元の学習可能パラメータ,相対的な座標を使う,
使わないなどを検証(Appendix D参照)
• クラス分類はTransformer出力後のCLS tokenを利用する
– CLS tokenの初期値も学習可能パラメータ
– iGPTは最終層を空間方向に平均化して入力
• 若干Transformerの構造が元と違う
– 具体的にはNormalizationの位置
22
Self Attention vs. CNN
24
“A Survey on Visual Transformer”より抜粋
超巨大データを使うと畳み込みを凌駕
ここまでのまとめ
• 画像におけるPure Attention Baseな研究について説明
• 特に最近提案されたViTは大量データ大量パラメータだと
畳み込みを凌駕することも
– 基本的にはパッチで分けてTransformerに突っ込む
• データ効率を上げる研究もいくつかある
– DeiT[Hugo+ 2021],Tokens-to-Tokens [Li+ 2021]
• なおこの辺の手法はViT-pytorchという個人のリポで再現実装有
– 正しく検証できているのかは不明 25
モダリティに汎用な(SELF) ATTENTION
26
書誌情報
• “Perceiver: General Perception with Iterative Attention”
• 著者:Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol
Vinyals, Joao Carreira
• 所属:DeepMind
• 投稿日:2021/03/04 (arXiv)
• 概要
– 様々なモダリティに汎用に使えるモデル,Perceiverの提案
– データ構造を仮定せずに高次元の入力をなるべくそのまま扱う
– 画像,音声,動画,点群などのデータでSoTAに近しい性能を達成
– (詳細があまり書いてないので若干解釈入ってます)
27
お気持ち
• 畳み込みのような構造を仮定することは正しいのか?
• なるべく自由度を上げてデータが自分自身を語るようにするべき
なのか?(本論文の立場)
• 構造を仮定する具体的な問題:ドメインごとに異なるアーキテク
チャを設定しなければならない
– 手間がかかる.データを集めればOKという問題にしたい.
28
提案:Perceiver (=Transformer + Cross Attention)
29
Transformerは超柔軟だが,高次元(長い系列)の扱いに難
=> 潜在コードとのCross Attentionを導入(あとはTransformer)
新しい部分 既存 繰り返し
Cross Attention
30
前のページの図の拡大 • Latent Arrayは,適当なサイズの学
習可能なパラメタ
• Byte Arrayは入力(画像など)
• N <<< M
– 例えば224×224の場合M>50000
– Nはハイパラ(実験では512など
• O(NN) => O(MN)
– 実際は繰り返すのでかなり軽量化
全体像(再掲)
31
• Cross Attention -> 潜在変数上でのTransformerを繰り返し
• パラメータ共有しても良い(C.F. Universal Transformer)
新しい部分 既存 繰り返し
Perceiverの補足(推測)
• ViTは固定グリッドで圧縮,Perceiverは類似度で圧縮
• Latent Arrayは1024 * 512とか
• Byte ArrayとLatent Arrayはサイズが違うので内積取れない
– 多分MHSAと同様適当なサイズに次元変換(=線形層1層かます)
• Byte ArrayのPositional EncodingはSinusoidをConcat
– 普通のTransformerはAdd
– 任意の数のPEを用意出来る(次元揃える必要ないので)
• Latent ArrayのPEは1次元の学習可能パラメタ(多分)
32
実験:扱っているデータ
33
汎用にPerceiverが利用出来ることを検証
(ほぼ変更なしでSoTAと同等程度であることを確認)
実験設定:ImageNet
• 224×224にクロップしてRandAugmentでデータ拡張
– なので全くドメイン知識使ってないわけではない
• PEはクロップした画像を[-1, 1]としてつける
– クロップする前でやると過学習したとのこと
• 最適化はLamb Optimizer + Scheduling
– “Large Batch Optimization for Deep Learning: Training BERT in 76 minutes”
• (Cross Attention + Lattent×6)×8 (合計44Mのパラメタ)
– Cross Attentionは1ヘッドのみ
– Latent Arrayは1024個(それぞれ512次元)
– 最初のCross Attention + Latent以外は重み共有
34
結果:ImageNet
35
• 青がデータ構造をほぼ仮定してない手法
• Transformerは64×64にリサイズした画像を利用
• ResNet-50やViT(ImageNetで学習)と同程度
構造仮定なし
同じPE
元論文
結果:Permuted ImageNet
36
• ImageNetをPermutationしたものでテスト
– Fixedは全データ共通のPermutation
– Randomは全データで違うPermutation
• Perceiverは構造が無い入力でもうまくいく(ほかはダメ)
AudioSetとPoint Cloud
37
AudioSet Point Cloud
• 音声は61,440次元,動画は65,536次元
• PerceiverがSoTA
• データ拡張も音声特有のものは未使用
• 2000個の座標が入力
• 構造仮定なしでは提案が
良い
Perceiverまとめ
• 柔軟なTransformerで高次元入力をうまく扱う方法の提案
– Self Attentionではなく潜在変数とのCross Attentionを使う
• 様々なモダリティでSoTAに近い性能
• GANsformerでも似た方法が取られている
• ViT系の実装と同じ人が実装を公開している
– 更に余談としてこの人はGLOM [Hinton, 2021]も実装を公開
– 動いているのかは不明
38
PRETRAINED TRANSFORMERS AS
UNIVERSAL COMPUTATION ENGINE
39
書誌情報
• “Pretrained Transformers as Universal Computation Engines”
• 著者:Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
• 所属:UC Berkeley, FAIR, Google Brain
• 投稿日:2021/03/09 (arXiv)
• 概要
– 言語で訓練したTransformerは別のモダリティにも転移可能(!!!?)
– 実験論文
40
Frozen Pretrained Transformer(FPT)
41
• 言語でTransformer全体を訓練(構造はGPT2を利用)
• Self AttentionとFeed Forwardを固定
• 入力層の埋め込み,位置埋め込み,出力層,Layer Normは再学習
• 別のモダリティに転移して性能を検証
検証している後続タスク
• Bit Memory
• Bit XOR
• List Ops([MAX 4 3 2 1] -> 4)
• MNIST(画像 -> カテゴリ)
• CIRFAR-10(画像 -> カテゴリ)
• CIFAR-10 LRA(画像をグレースケールにしてFlatten)
• Remote homology detection(アミノ酸配列 -> 折りたたみ構造)
42
実験1:FPT vs. Full vs. LSTM
43
• FPTでFull Transformer(全学習)と同程度の性能が出る
• (というものの,そんなに精度が高くない気はする.
実験2:言語以外のモダリティで事前学習
44
• Randomは単に初期化した状態
• BitはBit Memoryタスクで事前学習
• ViTはViTの重みを利用
• 総合的には言語がよい(のでは)という主張(例えばHomologyでVitは▲
実験3:Transformer vs LSTM
45
• どちらもランダムに初期化したものの比較
• Transformerが大幅に良い
実験8:初期化の工夫
46
• 学習済みとランダムだと様々な統計量が大きく異なる
• 統計量だけ合わせて初期化するとよくなるか?
• Yes(だがそれだけでもない.
実験7:モデルサイズに応じて性能上がるか
47
• 大きいと良い
• (実験はもっと大量にあるので興味ある方は論文参照
FPTまとめ・感想
• Transformerが学習しているのは,関数というより処理
• どういう処理をするべきかは,抽象的な空間では同じ
(なのかもしれない...)
• 汎化にはグラフ構造が必要という話とも関係するかもしれない
• 割と小規模なデータでしか試してないので,もっと大きい
規模だと変わる気もする
• Global Workspaceとの関係が書かれてたがよくわからなかった
48
あまり話せなかった話題(文献末尾につけてます)
• GANsformer
– 画像分類だけでなく,画像生成などでも使われている
– GANsformer自体はPure Attentionではない
• DeiT,Tokens-to-Tokens
– ViTの改良手法
– 主にデータ効率が良くなる
• GLOM
– ViT+Top-Down Attention + Consensus (- Multi Head)
– 画像をパース木で表すようなモデル
49
全体まとめ・感想
• 言語以外でのTransformerの活用が進んでいる
– 画像:ViTなど
– 汎用:Perceiver,Frozen Pretrained Transformerなど
• なるべく仮定をへらす方向に進むのはおそらくそうなる
• Transformerは結局何をしているのか
– 単なる関数と捉えると見誤るような気もする
• 若干解釈が入ってます(誤りあったらごめんなさい m(_ _)m
50
主な文献等:Self Attention, Transformer一般
• “Attention is All You Need”, NeurIPS2017
• “Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with
Depth”, 2021 (arXiv)
• “Do Transformer Modifications Transfer Across Implementations and Applications?”
2021 (arXiv)
• “Transformer Architecture: The Positional Encoding”, Blog
• “Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings”,
Youtube
51
主な文献等:画像に関するSelf Attention
Pure Attention系
(SASA) “Stand-Alone Self-Attention in Vision Models”, NeurIPS2019
(SANs) “Exploring Self-attention for Image Recognition”, CPVR2020
(axial attention) “Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation”, ECCV2020
(iGPT) “Generative Pretraining From Pixels”, ICML2020
(ViT) “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”. ICLR2021 (DeiT) “Training
data-efficient image transformers & distillation through attention”, arXiv, 2021
(T2T) “Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet”, arXiv, 2021
(Survey) “A Survey on Visual Transformer”, arXiv, 2021
それ以外
(GANsformer) “Generative Adversarial Transformers”, arXiv, 2021
(Bottleneck Transformer) “Bottleneck Transformers for Visual Recognition”, arXiv, 2021
(GLOM) “How to represent part-whole hierarchies in a neural network”, arXiv, 2021
52
参考:ViT系の実装をガリガリしている人
53
※ ちょこちょこ細かいところは違うっぽい? https://github.com/lucidrains より
1 de 52

Recomendados

【メタサーベイ】数式ドリブン教師あり学習 por
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
5.9K visualizações33 slides
近年のHierarchical Vision Transformer por
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision TransformerYusuke Uchida
13.8K visualizações46 slides
畳み込みニューラルネットワークの高精度化と高速化 por
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化Yusuke Uchida
64.5K visualizações133 slides
【DL輪読会】ViT + Self Supervised Learningまとめ por
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
4K visualizações52 slides
[DL輪読会]相互情報量最大化による表現学習 por
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習Deep Learning JP
7.6K visualizações43 slides
[DL輪読会]Focal Loss for Dense Object Detection por
[DL輪読会]Focal Loss for Dense Object Detection[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object DetectionDeep Learning JP
14.3K visualizações19 slides

Mais conteúdo relacionado

Mais procurados

【DL輪読会】Perceiver io a general architecture for structured inputs &amp; outputs por
【DL輪読会】Perceiver io  a general architecture for structured inputs &amp; outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs &amp; outputs
【DL輪読会】Perceiver io a general architecture for structured inputs &amp; outputs Deep Learning JP
1.5K visualizações23 slides
Transformer 動向調査 in 画像認識(修正版) por
Transformer 動向調査 in 画像認識(修正版)Transformer 動向調査 in 画像認識(修正版)
Transformer 動向調査 in 画像認識(修正版)Kazuki Maeno
1.7K visualizações45 slides
【メタサーベイ】Video Transformer por
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformercvpaper. challenge
2.2K visualizações42 slides
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces por
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
4.8K visualizações34 slides
[DL輪読会]Pay Attention to MLPs (gMLP) por
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)Deep Learning JP
16.9K visualizações24 slides
全力解説!Transformer por
全力解説!Transformer全力解説!Transformer
全力解説!TransformerArithmer Inc.
9.5K visualizações43 slides

Mais procurados(20)

【DL輪読会】Perceiver io a general architecture for structured inputs &amp; outputs por Deep Learning JP
【DL輪読会】Perceiver io  a general architecture for structured inputs &amp; outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs &amp; outputs
【DL輪読会】Perceiver io a general architecture for structured inputs &amp; outputs
Deep Learning JP1.5K visualizações
Transformer 動向調査 in 画像認識(修正版) por Kazuki Maeno
Transformer 動向調査 in 画像認識(修正版)Transformer 動向調査 in 画像認識(修正版)
Transformer 動向調査 in 画像認識(修正版)
Kazuki Maeno1.7K visualizações
【メタサーベイ】Video Transformer por cvpaper. challenge
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformer
cvpaper. challenge2.2K visualizações
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces por Deep Learning JP
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
Deep Learning JP4.8K visualizações
[DL輪読会]Pay Attention to MLPs (gMLP) por Deep Learning JP
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
Deep Learning JP16.9K visualizações
全力解説!Transformer por Arithmer Inc.
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
Arithmer Inc.9.5K visualizações
【メタサーベイ】Neural Fields por cvpaper. challenge
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields
cvpaper. challenge2.2K visualizações
Curriculum Learning (関東CV勉強会) por Yoshitaka Ushiku
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
Yoshitaka Ushiku63.8K visualizações
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant... por Deep Learning JP
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
Deep Learning JP2.7K visualizações
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets por Deep Learning JP
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Deep Learning JP2.9K visualizações
画像キャプションの自動生成 por Yoshitaka Ushiku
画像キャプションの自動生成画像キャプションの自動生成
画像キャプションの自動生成
Yoshitaka Ushiku58K visualizações
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing por Deep Learning JP
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
Deep Learning JP3K visualizações
[DL輪読会]Learning Latent Dynamics for Planning from Pixels por Deep Learning JP
[DL輪読会]Learning Latent Dynamics for Planning from Pixels[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
Deep Learning JP3.9K visualizações
自己教師学習(Self-Supervised Learning) por cvpaper. challenge
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
cvpaper. challenge12.7K visualizações
【DL輪読会】Scaling Laws for Neural Language Models por Deep Learning JP
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
Deep Learning JP3.5K visualizações
[DL輪読会]Flow-based Deep Generative Models por Deep Learning JP
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
Deep Learning JP14.6K visualizações
これからの Vision & Language ~ Acadexit した4つの理由 por Yoshitaka Ushiku
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
Yoshitaka Ushiku6.5K visualizações
Transformerを多層にする際の勾配消失問題と解決法について por Sho Takase
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
Sho Takase4.8K visualizações
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder por Deep Learning JP
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
Deep Learning JP1.9K visualizações
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra... por Deep Learning JP
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
Deep Learning JP8.8K visualizações

Similar a 【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)

The influence of "Distributed platforms" on #devops por
The influence of "Distributed platforms" on #devopsThe influence of "Distributed platforms" on #devops
The influence of "Distributed platforms" on #devopsKris Buytaert
1.3K visualizações29 slides
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand... por
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...Kyuri Kim
33 visualizações19 slides
Deep learning for real life applications por
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applicationsAnas Arram, Ph.D
330 visualizações60 slides
Paulking dlp por
Paulking dlpPaulking dlp
Paulking dlpd0nn9n
3.8K visualizações146 slides
Building a Neural Machine Translation System From Scratch por
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
401 visualizações64 slides
Transformer 動向調査 in 画像認識 por
Transformer 動向調査 in 画像認識Transformer 動向調査 in 画像認識
Transformer 動向調査 in 画像認識Kazuki Maeno
803 visualizações37 slides

Similar a 【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)(20)

The influence of "Distributed platforms" on #devops por Kris Buytaert
The influence of "Distributed platforms" on #devopsThe influence of "Distributed platforms" on #devops
The influence of "Distributed platforms" on #devops
Kris Buytaert1.3K visualizações
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand... por Kyuri Kim
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim33 visualizações
Deep learning for real life applications por Anas Arram, Ph.D
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
Anas Arram, Ph.D330 visualizações
Paulking dlp por d0nn9n
Paulking dlpPaulking dlp
Paulking dlp
d0nn9n3.8K visualizações
Building a Neural Machine Translation System From Scratch por Natasha Latysheva
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva401 visualizações
Transformer 動向調査 in 画像認識 por Kazuki Maeno
Transformer 動向調査 in 画像認識Transformer 動向調査 in 画像認識
Transformer 動向調査 in 画像認識
Kazuki Maeno803 visualizações
Platform-independent static binary code analysis using a meta-assembly language por zynamics GmbH
Platform-independent static binary code analysis using a meta-assembly languagePlatform-independent static binary code analysis using a meta-assembly language
Platform-independent static binary code analysis using a meta-assembly language
zynamics GmbH613 visualizações
The Joy of SciPy por kammeyer
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
kammeyer4K visualizações
Learning to Translate with Joey NMT por Julia Kreutzer
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
Julia Kreutzer277 visualizações
Foundation Models in Recommender Systems por Anoop Deoras
Foundation Models in Recommender SystemsFoundation Models in Recommender Systems
Foundation Models in Recommender Systems
Anoop Deoras172 visualizações
Dmk audioviz por Dan Kaminsky
Dmk audiovizDmk audioviz
Dmk audioviz
Dan Kaminsky658 visualizações
Keras: A versatile modeling layer for deep learning por Dr. Ananth Krishnamoorthy
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
Dr. Ananth Krishnamoorthy271 visualizações
Smalltalk Debug Lives in the Matrix por ESUG
Smalltalk Debug Lives in the MatrixSmalltalk Debug Lives in the Matrix
Smalltalk Debug Lives in the Matrix
ESUG783 visualizações
Deep Learning for Machine Translation por Matīss
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
Matīss 2.4K visualizações
Moby is killing your devops efforts por Kris Buytaert
Moby is killing your devops effortsMoby is killing your devops efforts
Moby is killing your devops efforts
Kris Buytaert2.7K visualizações
Dynamic Language Practices por Paul King
Dynamic Language PracticesDynamic Language Practices
Dynamic Language Practices
Paul King4.6K visualizações
Challenges in Maintaining a High Performance Search Engine Written in Java por lucenerevolution
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution1.2K visualizações
Deep Learning in NLP (BERT, ERNIE and REFORMER) por Biswajit Biswas
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Biswajit Biswas232 visualizações
Deep Learning & NLP: Graphs to the Rescue! por Roelof Pieters
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters20.2K visualizações

Mais de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners por
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
230 visualizações28 slides
【DL輪読会】事前学習用データセットについて por
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
261 visualizações20 slides
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP... por
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
180 visualizações26 slides
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition por
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
252 visualizações30 slides
【DL輪読会】Can Neural Network Memorization Be Localized? por
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
514 visualizações15 slides
【DL輪読会】Hopfield network 関連研究について por
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
1.3K visualizações29 slides

Mais de Deep Learning JP(20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners por Deep Learning JP
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP230 visualizações
【DL輪読会】事前学習用データセットについて por Deep Learning JP
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP261 visualizações
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP... por Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP180 visualizações
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition por Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP252 visualizações
【DL輪読会】Can Neural Network Memorization Be Localized? por Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP514 visualizações
【DL輪読会】Hopfield network 関連研究について por Deep Learning JP
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP1.3K visualizações
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 ) por Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP331 visualizações
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M... por Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP232 visualizações
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO" por Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP752 visualizações
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination " por Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP443 visualizações
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models por Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP1.3K visualizações
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware" por Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP410 visualizações
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo... por Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP397 visualizações
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ... por Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP684 visualizações
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive... por Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP812 visualizações
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil... por Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP374 visualizações
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait... por Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP325 visualizações
【DL輪読会】マルチモーダル 基盤モデル por Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP1.1K visualizações
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine... por Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP681 visualizações
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif... por Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP245 visualizações

Último

"How we switched to Kanban and how it integrates with product planning", Vady... por
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...Fwdays
61 visualizações24 slides
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa... por
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...The Digital Insurer
28 visualizações18 slides
"Fast Start to Building on AWS", Igor Ivaniuk por
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor IvaniukFwdays
36 visualizações76 slides
AMD: 4th Generation EPYC CXL Demo por
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL DemoCXL Forum
126 visualizações6 slides
JCon Live 2023 - Lice coding some integration problems por
JCon Live 2023 - Lice coding some integration problemsJCon Live 2023 - Lice coding some integration problems
JCon Live 2023 - Lice coding some integration problemsBernd Ruecker
67 visualizações85 slides
Data-centric AI and the convergence of data and model engineering: opportunit... por
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
29 visualizações40 slides

Último(20)

"How we switched to Kanban and how it integrates with product planning", Vady... por Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays61 visualizações
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa... por The Digital Insurer
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
The Digital Insurer28 visualizações
"Fast Start to Building on AWS", Igor Ivaniuk por Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays36 visualizações
AMD: 4th Generation EPYC CXL Demo por CXL Forum
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum126 visualizações
JCon Live 2023 - Lice coding some integration problems por Bernd Ruecker
JCon Live 2023 - Lice coding some integration problemsJCon Live 2023 - Lice coding some integration problems
JCon Live 2023 - Lice coding some integration problems
Bernd Ruecker67 visualizações
Data-centric AI and the convergence of data and model engineering: opportunit... por Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier29 visualizações
Five Things You SHOULD Know About Postman por Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 visualizações
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure por CXL Forum
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
CXL Forum125 visualizações
TE Connectivity: Card Edge Interconnects por CXL Forum
TE Connectivity: Card Edge InterconnectsTE Connectivity: Card Edge Interconnects
TE Connectivity: Card Edge Interconnects
CXL Forum96 visualizações
Empathic Computing: Delivering the Potential of the Metaverse por Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst449 visualizações
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur por Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays40 visualizações
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi113 visualizações
Understanding GenAI/LLM and What is Google Offering - Felix Goh por NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS39 visualizações
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS32 visualizações
AI: mind, matter, meaning, metaphors, being, becoming, life values por Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
Twain Liu 刘秋艳34 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 visualizações
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi por Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays26 visualizações
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM por CXL Forum
Samsung: CMM-H Tiered Memory Solution with Built-in DRAMSamsung: CMM-H Tiered Memory Solution with Built-in DRAM
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
CXL Forum105 visualizações
Web Dev - 1 PPT.pdf por gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 visualizações
GigaIO: The March of Composability Onward to Memory with CXL por CXL Forum
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXL
CXL Forum126 visualizações

【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)

Notas do Editor

  1. Beyond Reward Based End-to-End RL: Representation Learning and Dataset Optimization Perspective