SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
Which visual questions are
difficult to answer?
Analysis with entropy of answer distributions
Kento Terao Toru Tamaki Bisser Raytchev Shin’ichi SatohKazufumi Kaneda
Hiroshima University NII
Visual Question Answering and Dialog Workshop
at CVPR 2020, June 14
github https://github.com/tttamaki/vqd
arXiv https://arxiv.org/abs/2004.05595
What is this
item?
Is the catcher
wearing
the safety
gear?
WHICH IS DIFFICULT
TO ANSER?
Some questions are easy, some are difficult
Q : What is the player’s
position behind the batter?
yes
no
catcher
1
・・・
・・・
Q : How many people? many
over100
50
100
・・・
・・・
answer distribution
VQA
Model
VQA
Model
answer distribution
Motivation
• Finding which one is
difficult?
Application
• Using the difficulty for
developing new VQA
models
Contribution
• Providing a practical and
surprisingly simple way to
asses the difficulty
• Finding question clusters
difficult for any VQA
models
Difficult question
Easy question
Related works and our key contribution
# unique answers: 6
Entropy: 1.61
Ours: analyzing distributions of multiple VQA models
• No annotation of difficulty
• Estimating difficulty of visual questions, even in the test set
many
over100
50
100
・・・
・・・
VQA
model
1
Answer
distributions
Related works: analyzing
distributions of answers by humans
• Estimating # unique answers [Gurari
and Grauman, CHI’17]
• Predicting reasons why they disagree
[Bhattacharya+, ICCV2019]
• Predicting entropy as annotation
diversity [Yang+, HCOMP 2018]
Q: How many people
can fit in the 2 buses?
cloud workers
40, 80, 100, 100,
100, 100, 200,
many, many, lot
Ground truth answers
many
over100
50
100
・・・
・・・
VQA
model
2
many
over10
50
100
・・・
・・・
VQA
model
3
Proposed method
2. K-means clustering on
3D entropy vectors
Model I: image only
Model Q: question only
Model Q+I: image and question
Q+I baseline: Pythia v0.1 [CVPR2018]
dim = 3,129
dim = 3
1. Computing 3D entropy vectors
3. Analyzing accuracy of
VQA models for each cluster
entropy
Experiments
Dataset
• VQA v2 [Goyal+, CVPR2017]
• Training set for training VQA models
• Validation set for clustering and analysis
Protocol
• Training I, Q, and Q+I models on the training set
• For each model
• predicting answer distributions of each of
visual question in the validation set
• computing entropy values
• Performing k-means clustering (k=10) on the
validation set
• Computing statistics for each cluster, and sort
clusters in order of entropy
• Assigning questions in the test set to clusters
Comparisons
• Predicting by using the state-of-the-art
VQA models (trained on the training set)
• BUTD [CVPR2018]
• MFB [EMNLP2016]
• MFH [TNNLS2018]
• BAN-4/8 [NeurIPS2018]
• MCAN-small/large [CVPR2019]
• Pythia v0.3 [CVPR2019]
Results and observations
• All methods show poor
performances on the most
difficult cluster (about 10%
accuracy)
• The values of cluster entropy
are highly correlated with the
cluster accuracy; entropy
values increase while accuracy
decreases from cluster 0 to 9
• As the cluster difficulty
increases, the answers
predicted by the different
methods begin to differ
Examples in cluster 0 Annotations agree
VQA models agree and answer correctly (about 85% accuracy)
Examples in cluster 9 Visual questions are difficult to answer,
even when annotations agree (about 10% accuracy)
Check out !
Github
• Clustering results and visualization code available
Visual Question Difficulty (VQD)
https://github.com/tttamaki/vqd
Paper
• More in-depth discussions can be found on arXiv
https://arxiv.org/abs/2004.05595
You may use the difficulty in your
model for questions in the training,
validation, and test sets

Mais conteúdo relacionado

Semelhante a Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

20160423EdinburghPresentationV1
20160423EdinburghPresentationV120160423EdinburghPresentationV1
20160423EdinburghPresentationV1
Jan Hellings
 
Ch 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research pCh 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research p
MaximaSheffield592
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Lionel Briand
 

Semelhante a Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions (20)

Java Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth RoundJava Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth Round
 
Begin Your Preparation for the ISTQB Agile Tester (CTFL-AT) Exam
Begin Your Preparation for the ISTQB Agile Tester (CTFL-AT) ExamBegin Your Preparation for the ISTQB Agile Tester (CTFL-AT) Exam
Begin Your Preparation for the ISTQB Agile Tester (CTFL-AT) Exam
 
20160423EdinburghPresentationV1
20160423EdinburghPresentationV120160423EdinburghPresentationV1
20160423EdinburghPresentationV1
 
Continuous Testing
Continuous TestingContinuous Testing
Continuous Testing
 
Community of Practice - Project Specific - Steering Committee 3
Community of Practice - Project Specific - Steering Committee 3Community of Practice - Project Specific - Steering Committee 3
Community of Practice - Project Specific - Steering Committee 3
 
Ch 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research pCh 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research p
 
Ch 6 only 1. distinguish between a purpose statement, research p
Ch 6 only 1. distinguish between a purpose statement, research pCh 6 only 1. distinguish between a purpose statement, research p
Ch 6 only 1. distinguish between a purpose statement, research p
 
NPTEL_TA Orientation_July 2022.pptx
NPTEL_TA Orientation_July 2022.pptxNPTEL_TA Orientation_July 2022.pptx
NPTEL_TA Orientation_July 2022.pptx
 
Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems Research
 
Istqb foundation-and-selenium-java-automation-testing course
Istqb foundation-and-selenium-java-automation-testing courseIstqb foundation-and-selenium-java-automation-testing course
Istqb foundation-and-selenium-java-automation-testing course
 
Code Palousa presentation- "Giving Digital Eyes to your Synthetic Tests"
Code Palousa presentation- "Giving Digital Eyes to your Synthetic Tests"Code Palousa presentation- "Giving Digital Eyes to your Synthetic Tests"
Code Palousa presentation- "Giving Digital Eyes to your Synthetic Tests"
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
 
Database of complex quizzes in Moodle
Database of complex quizzes in MoodleDatabase of complex quizzes in Moodle
Database of complex quizzes in Moodle
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynote
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
lecture1.pdf
lecture1.pdflecture1.pdf
lecture1.pdf
 
Continuous Context Driven Test Improvement
Continuous Context Driven Test ImprovementContinuous Context Driven Test Improvement
Continuous Context Driven Test Improvement
 
A Strategy to conquer CBAP examination
A Strategy to conquer CBAP examinationA Strategy to conquer CBAP examination
A Strategy to conquer CBAP examination
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
 
MCA-ASS-Semester V.pdf
MCA-ASS-Semester V.pdfMCA-ASS-Semester V.pdf
MCA-ASS-Semester V.pdf
 

Mais de Toru Tamaki

Mais de Toru Tamaki (20)

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

  • 1. Which visual questions are difficult to answer? Analysis with entropy of answer distributions Kento Terao Toru Tamaki Bisser Raytchev Shin’ichi SatohKazufumi Kaneda Hiroshima University NII Visual Question Answering and Dialog Workshop at CVPR 2020, June 14 github https://github.com/tttamaki/vqd arXiv https://arxiv.org/abs/2004.05595
  • 2. What is this item? Is the catcher wearing the safety gear? WHICH IS DIFFICULT TO ANSER?
  • 3. Some questions are easy, some are difficult Q : What is the player’s position behind the batter? yes no catcher 1 ・・・ ・・・ Q : How many people? many over100 50 100 ・・・ ・・・ answer distribution VQA Model VQA Model answer distribution Motivation • Finding which one is difficult? Application • Using the difficulty for developing new VQA models Contribution • Providing a practical and surprisingly simple way to asses the difficulty • Finding question clusters difficult for any VQA models Difficult question Easy question
  • 4. Related works and our key contribution # unique answers: 6 Entropy: 1.61 Ours: analyzing distributions of multiple VQA models • No annotation of difficulty • Estimating difficulty of visual questions, even in the test set many over100 50 100 ・・・ ・・・ VQA model 1 Answer distributions Related works: analyzing distributions of answers by humans • Estimating # unique answers [Gurari and Grauman, CHI’17] • Predicting reasons why they disagree [Bhattacharya+, ICCV2019] • Predicting entropy as annotation diversity [Yang+, HCOMP 2018] Q: How many people can fit in the 2 buses? cloud workers 40, 80, 100, 100, 100, 100, 200, many, many, lot Ground truth answers many over100 50 100 ・・・ ・・・ VQA model 2 many over10 50 100 ・・・ ・・・ VQA model 3
  • 5. Proposed method 2. K-means clustering on 3D entropy vectors Model I: image only Model Q: question only Model Q+I: image and question Q+I baseline: Pythia v0.1 [CVPR2018] dim = 3,129 dim = 3 1. Computing 3D entropy vectors 3. Analyzing accuracy of VQA models for each cluster entropy
  • 6. Experiments Dataset • VQA v2 [Goyal+, CVPR2017] • Training set for training VQA models • Validation set for clustering and analysis Protocol • Training I, Q, and Q+I models on the training set • For each model • predicting answer distributions of each of visual question in the validation set • computing entropy values • Performing k-means clustering (k=10) on the validation set • Computing statistics for each cluster, and sort clusters in order of entropy • Assigning questions in the test set to clusters Comparisons • Predicting by using the state-of-the-art VQA models (trained on the training set) • BUTD [CVPR2018] • MFB [EMNLP2016] • MFH [TNNLS2018] • BAN-4/8 [NeurIPS2018] • MCAN-small/large [CVPR2019] • Pythia v0.3 [CVPR2019]
  • 7. Results and observations • All methods show poor performances on the most difficult cluster (about 10% accuracy) • The values of cluster entropy are highly correlated with the cluster accuracy; entropy values increase while accuracy decreases from cluster 0 to 9 • As the cluster difficulty increases, the answers predicted by the different methods begin to differ
  • 8. Examples in cluster 0 Annotations agree VQA models agree and answer correctly (about 85% accuracy)
  • 9. Examples in cluster 9 Visual questions are difficult to answer, even when annotations agree (about 10% accuracy)
  • 10. Check out ! Github • Clustering results and visualization code available Visual Question Difficulty (VQD) https://github.com/tttamaki/vqd Paper • More in-depth discussions can be found on arXiv https://arxiv.org/abs/2004.05595 You may use the difficulty in your model for questions in the training, validation, and test sets