SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Challenging Common Assumptions
in the Unsupervised Learning of
Disentangled Representations
(ICML 2019 Best Paper)
2019.07.17.
Sangwoo Mo
1
Outline
• Quick Review
• What is disentangled representation (DR)?
• Prior work on the unsupervised learning of DR
• Theoretical Results
• Unsupervised learning of DR is impossible without inductive biases
• Empirical Results
• Q1. Which method should be used?
• Q2. How to choose the hyperparameters?
• Q3. How to select the best model from a set of trained models?
2
Quick Review
• Disentangled representation: Learn a representation 𝑧 from the data 𝑥 s.t.
• Contain all the information of 𝑥 in a compact and interpretable structure
• Currently no single formal definition L (many definitions for the factor of variation)
3* Image from BetaVAE (ICLR 2017)
Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
4
Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
• FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018)
• Penalize the total correlation of the representation, which is estimated1 by
adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE)
51. It requires the aggregated posterior 𝑞(𝒛)
Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
• FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018)
• Penalize the total correlation of the representation, which is estimated1 by
adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE)
• DIP-VAE (ICLR 2018)
• Match 𝑞(𝒛) to the disentangled prior 𝑝(𝒛), where 𝐷 is a (tractable) moment matching
61. It requires the aggregated posterior 𝑞(𝒛)
Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
7
Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
• BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric
• Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1.,
then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4|
• Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4
| to the factor 𝑐.
8
Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
• BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric
• Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1.,
then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4|
• Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4
| to the factor 𝑐.
• Mutual Information Gap (NeurIPS 2018)
• Compute the mutual information between each factor 𝑐. and each dimension of 𝑧5
• For the highest and second highest dimensions 𝑖7 and 𝑖8 of the mutual information,
measure the difference between them: 𝐼 𝑐., 𝑧5:
− 𝐼(𝑐., 𝑧5;
)
9
Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
10
Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
11
Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
• Proof sketch. By construction.
• Let 𝑔: supp 𝒛 → 0,1 ?
s.t. 𝑔5 𝒗 = 𝑃(𝑧5 ≤ 𝑣5)
• Let ℎ: 0,1 ? → ℝ? s.t. ℎ5 𝒗 = 𝜓17(𝑣5) where 𝜓 is a c.d.f. of a normal distribution
• Then for any orthogonal matrix 𝑨, the following 𝑓 satisfies the condition:
𝑓 𝒖 = ℎ ∘ 𝑔 17(𝑨 ℎ ∘ 𝑔 𝒖 )
12
Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
• Corollary. One cannot find the disentangled representation 𝑟(𝒙) (w.r.t. to the generative
model 𝐺(𝒙|𝒛)) as there are two equivalent generative models 𝐺 and 𝐺′ which has same
marginal distribution 𝑝(𝒙) but 𝒛4 = 𝑓(𝒛) is completely entangled w.r.t. 𝒛 (so as 𝑟(𝒙))
• Namely, inferring representation 𝒛 from observation 𝒙 is not a well-defined problem
13
Theoretical Results
• 𝛽-VAE learns some decorrelated features, but they are not semantically decomposed
• E.g., the width is entangled with the leg style in 𝛽-VAE
14* Image from BetaVAE (ICLR 2017)
Empirical Results
• Q1. Which method should be used?
• A. Hyperparameters and random seeds matter more than the choice of the model
15
Empirical Results
• Q2. How to choose the hyperparameters?
• A. Selecting the best hyperparameter is extremely hard due to the randomness
16
Empirical Results
• Q2. How to choose the hyperparameters?
• A. Also, there is no obvious trend over the variation of hyperparameters
17
Empirical Results
• Q2. How to choose the hyperparameters?
• A. Good hyperparameters often can be transferred (e.g., dSprites → color-dSprites)
18
Rank correlation matrix
Empirical Results
• Q3. How to select the best model from a set of trained models?
• A. Unsupervised (training) scores do not correlated to the disentanglement metrics
19
Unsupervised scores vs disentanglement metrics
Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• Q1. Which method should be used?
• A. Current methods should be rigorously validated (no significant difference)
20
Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• Q1. Which method should be used?
• A. Current methods should be rigorously validated (no significant difference)
• Q2. How to choose the hyperparameters?
• A. No rule of thumb, but transfer across datasets seem to help!
21
Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• Q1. Which method should be used?
• A. Current methods should be rigorously validated (no significant difference)
• Q2. How to choose the hyperparameters?
• A. No rule of thumb, but transfer across datasets seem to help!
• Q3. How to select the best model from a set of trained models?
• A. (Unsupervised) model selection remains a key challenge!
22
Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
23
Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
24
Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
2. One can obtain even better results if one use few labels into the learning processes
(use a simple supervised regularizer)
25
Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
2. One can obtain even better results if one use few labels into the learning processes
(use a simple supervised regularizer)
• Take-home message: Future research should be on “how to utilize inductive bias better”
using a few labels, rather than the previous total correlation-like approaches
26

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Anomaly detection survey
Anomaly detection surveyAnomaly detection survey
Anomaly detection survey
 
(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展
 
論文紹介「A Perspective View and Survey of Meta-Learning」
論文紹介「A Perspective View and Survey of Meta-Learning」論文紹介「A Perspective View and Survey of Meta-Learning」
論文紹介「A Perspective View and Survey of Meta-Learning」
 
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
 
Score based generative model
Score based generative modelScore based generative model
Score based generative model
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験
 
[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder
 
[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
 
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
 
Pr083 Non-local Neural Networks
Pr083 Non-local Neural NetworksPr083 Non-local Neural Networks
Pr083 Non-local Neural Networks
 
【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識
 
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image ManipulationDiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
 
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelNIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
 

Semelhante a Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

Data Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser UniversityData Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser University
soniyamarghani
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
Henock Beyene
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
WiMLDSMontreal
 

Semelhante a Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations (20)

Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Bottle sum
Bottle sumBottle sum
Bottle sum
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 
Lec16: Medical Image Registration (Advanced): Deformable Registration
Lec16: Medical Image Registration (Advanced): Deformable RegistrationLec16: Medical Image Registration (Advanced): Deformable Registration
Lec16: Medical Image Registration (Advanced): Deformable Registration
 
Data Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser UniversityData Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser University
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid data
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Menggunakan AlisJK : Equating
Menggunakan AlisJK : EquatingMenggunakan AlisJK : Equating
Menggunakan AlisJK : Equating
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsMixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect Interactions
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
 

Mais de Sangwoo Mo

Mais de Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

  • 1. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations (ICML 2019 Best Paper) 2019.07.17. Sangwoo Mo 1
  • 2. Outline • Quick Review • What is disentangled representation (DR)? • Prior work on the unsupervised learning of DR • Theoretical Results • Unsupervised learning of DR is impossible without inductive biases • Empirical Results • Q1. Which method should be used? • Q2. How to choose the hyperparameters? • Q3. How to select the best model from a set of trained models? 2
  • 3. Quick Review • Disentangled representation: Learn a representation 𝑧 from the data 𝑥 s.t. • Contain all the information of 𝑥 in a compact and interpretable structure • Currently no single formal definition L (many definitions for the factor of variation) 3* Image from BetaVAE (ICLR 2017)
  • 4. Quick Review: Prior Methods • BetaVAE (ICLR 2017) • Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior) 4
  • 5. Quick Review: Prior Methods • BetaVAE (ICLR 2017) • Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior) • FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018) • Penalize the total correlation of the representation, which is estimated1 by adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE) 51. It requires the aggregated posterior 𝑞(𝒛)
  • 6. Quick Review: Prior Methods • BetaVAE (ICLR 2017) • Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior) • FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018) • Penalize the total correlation of the representation, which is estimated1 by adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE) • DIP-VAE (ICLR 2018) • Match 𝑞(𝒛) to the disentangled prior 𝑝(𝒛), where 𝐷 is a (tractable) moment matching 61. It requires the aggregated posterior 𝑞(𝒛)
  • 7. Quick Review: Evaluation Metrics • Many heuristics are proposed to quantitatively evaluate the disentanglement • Basic idea: Factors and representation should have 1-1 correspondence 7
  • 8. Quick Review: Evaluation Metrics • Many heuristics are proposed to quantitatively evaluate the disentanglement • Basic idea: Factors and representation should have 1-1 correspondence • BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric • Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1., then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4| • Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4 | to the factor 𝑐. 8
  • 9. Quick Review: Evaluation Metrics • Many heuristics are proposed to quantitatively evaluate the disentanglement • Basic idea: Factors and representation should have 1-1 correspondence • BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric • Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1., then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4| • Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4 | to the factor 𝑐. • Mutual Information Gap (NeurIPS 2018) • Compute the mutual information between each factor 𝑐. and each dimension of 𝑧5 • For the highest and second highest dimensions 𝑖7 and 𝑖8 of the mutual information, measure the difference between them: 𝐼 𝑐., 𝑧5: − 𝐼(𝑐., 𝑧5; ) 9
  • 10. Theoretical Results • “Unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data” 10
  • 11. Theoretical Results • “Unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data” • Theorem. For 𝑝 𝒛 = ∏5>7 ? 𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t. • 𝒛 and 𝑓(𝒛) are completely entangled (i.e., ABC(𝒖) AEF ≠ 0 a.e. for all 𝑖, 𝑗) • 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖) 11
  • 12. Theoretical Results • “Unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data” • Theorem. For 𝑝 𝒛 = ∏5>7 ? 𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t. • 𝒛 and 𝑓(𝒛) are completely entangled (i.e., ABC(𝒖) AEF ≠ 0 a.e. for all 𝑖, 𝑗) • 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖) • Proof sketch. By construction. • Let 𝑔: supp 𝒛 → 0,1 ? s.t. 𝑔5 𝒗 = 𝑃(𝑧5 ≤ 𝑣5) • Let ℎ: 0,1 ? → ℝ? s.t. ℎ5 𝒗 = 𝜓17(𝑣5) where 𝜓 is a c.d.f. of a normal distribution • Then for any orthogonal matrix 𝑨, the following 𝑓 satisfies the condition: 𝑓 𝒖 = ℎ ∘ 𝑔 17(𝑨 ℎ ∘ 𝑔 𝒖 ) 12
  • 13. Theoretical Results • “Unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data” • Theorem. For 𝑝 𝒛 = ∏5>7 ? 𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t. • 𝒛 and 𝑓(𝒛) are completely entangled (i.e., ABC(𝒖) AEF ≠ 0 a.e. for all 𝑖, 𝑗) • 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖) • Corollary. One cannot find the disentangled representation 𝑟(𝒙) (w.r.t. to the generative model 𝐺(𝒙|𝒛)) as there are two equivalent generative models 𝐺 and 𝐺′ which has same marginal distribution 𝑝(𝒙) but 𝒛4 = 𝑓(𝒛) is completely entangled w.r.t. 𝒛 (so as 𝑟(𝒙)) • Namely, inferring representation 𝒛 from observation 𝒙 is not a well-defined problem 13
  • 14. Theoretical Results • 𝛽-VAE learns some decorrelated features, but they are not semantically decomposed • E.g., the width is entangled with the leg style in 𝛽-VAE 14* Image from BetaVAE (ICLR 2017)
  • 15. Empirical Results • Q1. Which method should be used? • A. Hyperparameters and random seeds matter more than the choice of the model 15
  • 16. Empirical Results • Q2. How to choose the hyperparameters? • A. Selecting the best hyperparameter is extremely hard due to the randomness 16
  • 17. Empirical Results • Q2. How to choose the hyperparameters? • A. Also, there is no obvious trend over the variation of hyperparameters 17
  • 18. Empirical Results • Q2. How to choose the hyperparameters? • A. Good hyperparameters often can be transferred (e.g., dSprites → color-dSprites) 18 Rank correlation matrix
  • 19. Empirical Results • Q3. How to select the best model from a set of trained models? • A. Unsupervised (training) scores do not correlated to the disentanglement metrics 19 Unsupervised scores vs disentanglement metrics
  • 20. Summary • TL;DR: Current unsupervised learning of disentangled representation has a limitation! • Summary of findings: • Q1. Which method should be used? • A. Current methods should be rigorously validated (no significant difference) 20
  • 21. Summary • TL;DR: Current unsupervised learning of disentangled representation has a limitation! • Summary of findings: • Q1. Which method should be used? • A. Current methods should be rigorously validated (no significant difference) • Q2. How to choose the hyperparameters? • A. No rule of thumb, but transfer across datasets seem to help! 21
  • 22. Summary • TL;DR: Current unsupervised learning of disentangled representation has a limitation! • Summary of findings: • Q1. Which method should be used? • A. Current methods should be rigorously validated (no significant difference) • Q2. How to choose the hyperparameters? • A. No rule of thumb, but transfer across datasets seem to help! • Q3. How to select the best model from a set of trained models? • A. (Unsupervised) model selection remains a key challenge! 22
  • 23. Following Work & Future Direction • “Disentangling Factors of Variation Using Few Labels” (ICLR Workshop 2019, NeurIPS 2019 submission) • Summary of findings: Using a few labels highly improves the disentanglement! 23
  • 24. Following Work & Future Direction • “Disentangling Factors of Variation Using Few Labels” (ICLR Workshop 2019, NeurIPS 2019 submission) • Summary of findings: Using a few labels highly improves the disentanglement! 1. Existing disentanglement metrics + few labels perform well on model selection, even though models are completely trained in an unsupervised manner 24
  • 25. Following Work & Future Direction • “Disentangling Factors of Variation Using Few Labels” (ICLR Workshop 2019, NeurIPS 2019 submission) • Summary of findings: Using a few labels highly improves the disentanglement! 1. Existing disentanglement metrics + few labels perform well on model selection, even though models are completely trained in an unsupervised manner 2. One can obtain even better results if one use few labels into the learning processes (use a simple supervised regularizer) 25
  • 26. Following Work & Future Direction • “Disentangling Factors of Variation Using Few Labels” (ICLR Workshop 2019, NeurIPS 2019 submission) • Summary of findings: Using a few labels highly improves the disentanglement! 1. Existing disentanglement metrics + few labels perform well on model selection, even though models are completely trained in an unsupervised manner 2. One can obtain even better results if one use few labels into the learning processes (use a simple supervised regularizer) • Take-home message: Future research should be on “how to utilize inductive bias better” using a few labels, rather than the previous total correlation-like approaches 26