SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Unsupervised Domain
Adaptation for Spatio-Temporal
Action Localization
Nakul Agrawal, Yi-Ting Chen,
Behzad Dariush, Ming-Husan Yang
BMVC2020
大見一樹(名工大)
2022/10/28
概要
nSpatio-temporal action localization (STAL) の教師なしドメイン適応
• 著者によるとSTALに教師なしドメイン適応を用いた最初の研究
nSTALに有効な3つのドメイン適応モジュールの提案
!"#$%&'()*(+,+-./01.2234+56667879:
ドメイン適応
ソース ターゲット
教師なし
ドメイン適応
ラベル
なし ソースドメイン!"#の情報を使用して
ターゲットドメイン($)におけるタスクを解く
ラベルがない(少ない)$の特徴量を
ラベルがある"の特徴量に近づけることで
$の適切な特徴量を獲得する
ドメイン適応とは
ドメインとは
あるデータが持つ特有の傾向
今回の場合はデータの収集元が異なる
ラベル
あり
手法の概要
キーフレームで
ドメイン適応
画像レベルで
ドメイン適応
インスタンスレベルで
ドメイン適応
モデルのベース
キーフレームで
ドメイン適応
画像レベルで
ドメイン適応
インスタンスレベルで
ドメイン適応
領域検出
任意のサイズの検出領域を
固定サイズにプーリング
検出対象のフレームである
Key frameが入力される
Key frameが中間フレームとなる
クリップが入力される
Spatial Domain Classifier
画像レベルで
ドメイン適応
インスタンスレベルで
ドメイン適応
識別器の出力
損失
分類の難易度で重みづけ
ドメインラベル
キーフレームで
ドメイン適応
逆伝播時に
勾配の符号を反転
Temporal Domain Classifier (Image)
キーフレームで
ドメイン適応
画像レベルで
ドメイン適応
インスタンスレベルで
ドメイン適応
Temporal Domain Classifier (Instance)
キーフレームで
ドメイン適応
画像レベルで
ドメイン適応
インスタンスレベルで
ドメイン適応
識別器の出力
損失
手法の概要
画像レベルで
ドメイン適応
インスタンスレベルで
ドメイン適応
キーフレームで
ドメイン適応
実験設定
nデータセット
• UCF101-Sports [Rodriguez+, CVPR2008]
• JHMDB [Jhuang+, ICCV2013]
• UCF101 [Soomro+, arXiv2012]
nドメイン適応の設定
• UCF101-Sports → UCF101 (共通する4クラス)
• JHMDB21 → UCF101 (共通する3クラス)
実験結果
nドメイン適応モジュールを3つ使うのが有効
UCF101-Sports → UCF101の結果
ドメイン適応なし
ドメイン適応あり
実験結果
nドメイン適応モジュールを3つ使うのが有効
JHMDB → UCF101の結果
ドメイン適応なし
ドメイン適応あり
まとめ
nSpatio-temporal action localizationのドメイン適応手法を提案
• 3つのドメイン適応モジュールの有効性を実験的に示した

Mais conteúdo relacionado

Mais de Toru Tamaki

論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video SegmentationToru Tamaki
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New HopeToru Tamaki
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...Toru Tamaki
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt TuningToru Tamaki
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in MoviesToru Tamaki
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICAToru Tamaki
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context RefinementToru Tamaki
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...Toru Tamaki
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...Toru Tamaki
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusionToru Tamaki
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous DrivingToru Tamaki
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large MotionToru Tamaki
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense PredictionsToru Tamaki
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation LearningToru Tamaki
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image CaptioningToru Tamaki
 
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language ModelsToru Tamaki
 
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video RetrievalToru Tamaki
 
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningToru Tamaki
 
論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action Recognition論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action RecognitionToru Tamaki
 

Mais de Toru Tamaki (20)

論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
 
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
 
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
 
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
 
論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action Recognition論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action Recognition
 

文献紹介:Unsupervised Domain Adaptation for Spatio-Temporal Action Localization