SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Understanding Deep Learning
Requires Rethinking Generalization
PR12와 함께 이해하는
Jaejun Yoo
Ph.D. Candidate @KAIST
PR12
20th Jan, 2018
Today’s contents
Understanding deep learning requires rethinking generalization
by C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals
Nov. 2016: https://arxiv.org/abs/1611.03530
ICLR 2017 Best Paper
???
@#*($DFJLDK
…paper awards
Questions
Why large neural networks generalize well in practice?
Can the traditional theories on generalization actually
explain the results we are seeing these days?
What is it then that distinguishes neural networks
that generalize well from those that don’t?
Questions
Why large neural networks generalize well in practice?
Can the traditional theories on generalization actually
explain the results we are seeing these days?
What is it then that distinguishes neural networks
that generalize well from those that don’t?
Questions
Why large neural networks generalize well in practice?
Can the traditional theories on generalization actually
explain the results we are seeing these days?
What is it then that distinguishes neural networks
that generalize well from those that don’t?
Questions
Why large neural networks generalize well in practice?
Can the traditional theories on generalization actually
explain the results we are seeing these days?
What is it then that distinguishes neural networks
that generalize well from those that don’t?
“Deep neural networks TOO easily fit random labels.”
Conventional wisdom
Small generalization error due to
• Model family
• Various regularization techniques
“Generalization error”
전략: 최대한 적은 parameter를 갖으면서
training error가 최소인 model을 찾자
Conventional wisdom
???
Small generalization error due to
• Model family
• Various regularization techniques
“Generalization error”
Effective capacity of neural networks
Parameter Count
Num Training Samples
MLP 1 x 512 AlexNet Inception Wide Resnet
MLP 1 x 512
p/n = 24
Inception
p/n = 33
Test error
AlexNet
p/n = 28
Wide ResNet
p/n = 179
Effective capacity of neural networks
Parameter Count
Num Training Samples
MLP 1 x 512 AlexNet Inception Wide Resnet
MLP 1 x 512
p/n = 24
Inception
p/n = 33
Test error
AlexNet
p/n = 28
Wide ResNet
p/n = 179
If counting the number of parameter is not a useful way to measure the model complexity,
then how can we measure the effective capacity of the model?
Randomization test
Fitting random labels and pixels
Randomization test
Naïve intuition
learning is impossible, e.g., training not
converging or slowing down substantially.
Fitting random labels and pixels
Randomization test
Fitting random labels and pixels
Implications
Rademacher complexity and VC-dimension
Uniform stability
“How sensitive the algorithm is to the replacement of a single example”
Implications
Rademacher complexity and VC-dimension
Uniform stability
“How sensitive the algorithm is to the replacement of a single example”
Implications
Rademacher complexity and VC-dimension
Uniform stability
“How sensitive the algorithm is to the replacement of a single example”
Solely a property of the algorithm
(nothing to do with the data)
Implications
• The effective capacity of neural networks is sufficient for
memorizing the entire data set.
• Even optimization on random labels remains easy. In fact,
training time increases only by a small constant factor
compared with training on the true labels
• Randomizing labels is solely a data transformation, leaving
all other properties of the learning problem unchanged
Summary
The role of regularization
Regularization 같을 걸 끼얹나?
What about regularization techniques?
당신의 모델이 자꾸 overfitting 될 때
이렇게 하면 살 수 있다.
위기탈출 넘버원
The role of regularization
What about regularization techniques?
The role of regularization
What about regularization techniques?
없어도 꽤 잘 되는데?
The role of regularization
What about regularization techniques?
있어도 잘만 (overfitting)되는데?
The role of regularization
What about regularization techniques?
Regularization certainly helps generalization but
it is NOT the fundamental reason for generalization.
있어도 잘만 (overfitting)되는데?
Finite sample expressivity
Much effort has gone into studying the expressivity of NNs
However, almost all of these results are at the “population level”
Showing what functions of the entire domain can and cannot be
represented by certain classes of NNs with the same number of
parameters
What is more relevant in practice is the expressive power of
NNs on a finite sample size of n
Finite sample expressivity
The expressive power of NNs on a finite sample size of n?
“As soon as the number of parameters p of a networks is greater than n, even
simple two-layer neural networks can represent any function of the input sample.”
"(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는
d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
Finite sample expressivity
Proof)
• are invertible if and only if the
diagonal elements are nonzero
• have their eigenvalues taken directly
from the diagonal elements
Lower triangular matrix…
∎
∃𝐀𝐀−𝟏𝟏 ∵ 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝐀𝐀 = 𝒏𝒏
Finite sample expressivity
Proof)
"(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는
d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
Finite sample expressivity
Proof)
"(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는
d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
Finite sample expressivity
Proof)
"(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는
d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
𝒚𝒚 = 𝐀𝐀𝒘𝒘, ∃𝐀𝐀−𝟏𝟏
∵ 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝐀𝐀 = 𝒏𝒏 𝒃𝒃𝒃𝒃 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳 𝟏𝟏
Implicit regularization
Neural net이 잘 되는 이유는 잘 모르겠지만 linear model
로부터 어떤 insight를 얻을 수는 없을까?
An appeal to linear models
Implicit regularization
An appeal to linear models
But do all global minima generalize equally well?
How do we determine one to the other?
“Curvature”
Implicit regularization
An appeal to linear models
But do all global minima generalize equally well?
How do we determine one to the other?
“Curvature”
In linear case, curvature of
all optimal solutions is the same
∵ Hessian of the loss function is
not a function of the choice of 𝑤𝑤
Implicit regularization
Algorithm 자체가 어떤 constraint
Here, implicit regularization = SGD algorithm
SGD는 solution이 Minimum Norm Solution으로 수렴
If curvature doesn’t distinguish global minima, what does?
Algorithm 자체가 어떤 constraint
Here, implicit regularization = SGD algorithm
SGD는 solution이 Minimum Norm Solution으로 수렴
Implicit regularization
If curvature doesn’t distinguish global minima, what does?
𝑤𝑤𝑚𝑚𝑚𝑚 = 𝐗𝐗 𝐓𝐓 𝐗𝐗𝐗𝐗 𝐓𝐓 −𝟏𝟏
𝑦𝑦
𝐗𝐗𝐗𝐗 𝐓𝐓
∈ ℝ𝒏𝒏×𝒏𝒏
일종의 kernel, gram matrix K
Implicit regularization
Quite surprisingly…
이렇게 단순한 방식으로 구한 solution이 err가 매우 낮다!
Implicit regularization
Quite surprisingly…
이렇게 단순한 방식으로 구한 solution이 err가 매우 낮다!
RBF kernel
𝐗𝐗𝐗𝐗 𝐓𝐓
𝜶𝜶 = 𝒚𝒚
Implicit regularization
Quite surprisingly…
이렇게 단순한 방식으로 구한 solution이 err가 매우 낮다!
• Simple experimental framework for understanding the
effective capacity of deep learning models
• Successful DeepNets are able to overfit the training set
• Other formal measure of complexity for the models/
algorithms/data distributions are needed to precisely explain
the over-parameterized regime
Conclusion
We believe that …
“understanding neural networks
requires rethinking generalization.”
Conclusion
References
• https://arxiv.org/pdf/1611.03530.pdf (paper)
• https://openreview.net/forum?id=Sy8gdB9xx (open review comments)
• https://github.com/pluskid/fitting-random-labels (code)
• http://pluskid.org/slides/ICLR2017-Poster.pdf (poster)
• https://www.youtube.com/watch?v=kCj51pTQPKI (presentation, YouTube)
• https://www.slideshare.net/JungHoonSeo2/understanding-deep-learning-requires-
rethinking-generalization-2017-12 (slideshare: Kor)
• https://danieltakeshi.github.io/2017/05/19/understanding-deep-learning-requires-
rethinking-generalization-my-thoughts-and-notes (blog)
Things to discuss about…
• The effective capacity of neural networks is sufficient for
memorizing the entire data set.
• Even optimization on random labels remains easy. In fact,
training time increases only by a small constant factor
compared with training on the true labels
• Randomizing labels is solely a data transformation, leaving
all other properties of the learning problem unchanged
Summary
정말???
기존의 일반적인 supervised learning setting:
Training과 test의 domain이 같다고 가정.
Statistical Learning Theory
: SVM, …
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
비디오 게임 고객평가 (X)
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
비디오 게임 고객평가 (X)
NN으로 표현되는 H 함수 공간으로부터….
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
비디오 게임 고객평가 (X)
Classifier h를 학습하는데,
target의 label을 모르지만
source(X,Y)와 target(X)
두 도메인 모두에서 잘 label
을 찾는 h를 찾고 싶다.
NN으로 표현되는 H 함수 공간으로부터….
기존 전략: 최대한 적은 parameter를 갖으면서
training error가 최소인 model을 찾자
이제는 training domain (source)과 testing
domain (target)이 서로 다르다
기존의 전략 외에 다른 전략이 추가로 필요하다.
A Computable Adaptation Bound
Divergence estimation
complexity
Dependent on number
of unlabeled samples
The optimal joint hypothesis
is the hypothesis with minimal combined error
is that error

Mais conteúdo relacionado

Mais procurados

[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
MLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMasakazu Shinoda
 
Off policy learning
Off policy learningOff policy learning
Off policy learningMasa61rl
 
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
【DL輪読会】Free Lunch for Few-shot Learning: Distribution CalibrationDeep Learning JP
 
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalizationUnderstanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalizationJamie Seol
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropoutWuhyun Rico Shin
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령Namkug Kim
 
Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Minho Lee
 
【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language Models【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language ModelsDeep Learning JP
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient DescentDeep Learning JP
 
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)XAIC
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)SANG WON PARK
 
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...Deep Learning JP
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear ModelJungkyu Lee
 
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task LearningDeep Learning JP
 
Overcoming Catastrophic Forgetting in Neural Networks読んだ
Overcoming Catastrophic Forgetting in Neural Networks読んだOvercoming Catastrophic Forgetting in Neural Networks読んだ
Overcoming Catastrophic Forgetting in Neural Networks読んだYusuke Uchida
 

Mais procurados (20)

[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
MLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMLデザインパターン入門_Embeddings
MLデザインパターン入門_Embeddings
 
Off policy learning
Off policy learningOff policy learning
Off policy learning
 
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
 
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalizationUnderstanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
 
Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)
 
【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language Models【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language Models
 
Review SRGAN
Review SRGANReview SRGAN
Review SRGAN
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)
 
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model
 
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
 
Overcoming Catastrophic Forgetting in Neural Networks読んだ
Overcoming Catastrophic Forgetting in Neural Networks読んだOvercoming Catastrophic Forgetting in Neural Networks読んだ
Overcoming Catastrophic Forgetting in Neural Networks読んだ
 

Semelhante a [PR12] understanding deep learning requires rethinking generalization

Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationAhmet Kuzubaşlı
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayAmazon Web Services
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Problems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksProblems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksVipul Vaibhaw
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan forijaia
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsKnoldus Inc.
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Theoretical Deep Learning
Theoretical Deep LearningTheoretical Deep Learning
Theoretical Deep LearningXiaohu ZHU
 

Semelhante a [PR12] understanding deep learning requires rethinking generalization (20)

Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking Generalization
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Problems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksProblems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networks
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Theoretical Deep Learning
Theoretical Deep LearningTheoretical Deep Learning
Theoretical Deep Learning
 

Mais de JaeJun Yoo

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of FunctionsJaeJun Yoo
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...JaeJun Yoo
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GANJaeJun Yoo
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
 
[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun YooJaeJun Yoo
 
[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun YooJaeJun Yoo
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yooJaeJun Yoo
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooJaeJun Yoo
 
[PR12] intro. to gans jaejun yoo
[PR12] intro. to gans   jaejun yoo[PR12] intro. to gans   jaejun yoo
[PR12] intro. to gans jaejun yooJaeJun Yoo
 

Mais de JaeJun Yoo (12)

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GAN
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo
 
[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yoo
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun Yoo
 
[PR12] intro. to gans jaejun yoo
[PR12] intro. to gans   jaejun yoo[PR12] intro. to gans   jaejun yoo
[PR12] intro. to gans jaejun yoo
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

[PR12] understanding deep learning requires rethinking generalization

  • 1. Understanding Deep Learning Requires Rethinking Generalization PR12와 함께 이해하는 Jaejun Yoo Ph.D. Candidate @KAIST PR12 20th Jan, 2018
  • 2. Today’s contents Understanding deep learning requires rethinking generalization by C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals Nov. 2016: https://arxiv.org/abs/1611.03530 ICLR 2017 Best Paper ??? @#*($DFJLDK …paper awards
  • 3. Questions Why large neural networks generalize well in practice? Can the traditional theories on generalization actually explain the results we are seeing these days? What is it then that distinguishes neural networks that generalize well from those that don’t?
  • 4. Questions Why large neural networks generalize well in practice? Can the traditional theories on generalization actually explain the results we are seeing these days? What is it then that distinguishes neural networks that generalize well from those that don’t?
  • 5. Questions Why large neural networks generalize well in practice? Can the traditional theories on generalization actually explain the results we are seeing these days? What is it then that distinguishes neural networks that generalize well from those that don’t?
  • 6. Questions Why large neural networks generalize well in practice? Can the traditional theories on generalization actually explain the results we are seeing these days? What is it then that distinguishes neural networks that generalize well from those that don’t? “Deep neural networks TOO easily fit random labels.”
  • 7. Conventional wisdom Small generalization error due to • Model family • Various regularization techniques “Generalization error” 전략: 최대한 적은 parameter를 갖으면서 training error가 최소인 model을 찾자
  • 8. Conventional wisdom ??? Small generalization error due to • Model family • Various regularization techniques “Generalization error”
  • 9. Effective capacity of neural networks Parameter Count Num Training Samples MLP 1 x 512 AlexNet Inception Wide Resnet MLP 1 x 512 p/n = 24 Inception p/n = 33 Test error AlexNet p/n = 28 Wide ResNet p/n = 179
  • 10. Effective capacity of neural networks Parameter Count Num Training Samples MLP 1 x 512 AlexNet Inception Wide Resnet MLP 1 x 512 p/n = 24 Inception p/n = 33 Test error AlexNet p/n = 28 Wide ResNet p/n = 179 If counting the number of parameter is not a useful way to measure the model complexity, then how can we measure the effective capacity of the model?
  • 12. Randomization test Naïve intuition learning is impossible, e.g., training not converging or slowing down substantially. Fitting random labels and pixels
  • 14. Implications Rademacher complexity and VC-dimension Uniform stability “How sensitive the algorithm is to the replacement of a single example”
  • 15. Implications Rademacher complexity and VC-dimension Uniform stability “How sensitive the algorithm is to the replacement of a single example”
  • 16. Implications Rademacher complexity and VC-dimension Uniform stability “How sensitive the algorithm is to the replacement of a single example” Solely a property of the algorithm (nothing to do with the data)
  • 17. Implications • The effective capacity of neural networks is sufficient for memorizing the entire data set. • Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels • Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged Summary
  • 18. The role of regularization Regularization 같을 걸 끼얹나? What about regularization techniques? 당신의 모델이 자꾸 overfitting 될 때 이렇게 하면 살 수 있다. 위기탈출 넘버원
  • 19. The role of regularization What about regularization techniques?
  • 20. The role of regularization What about regularization techniques? 없어도 꽤 잘 되는데?
  • 21. The role of regularization What about regularization techniques? 있어도 잘만 (overfitting)되는데?
  • 22. The role of regularization What about regularization techniques? Regularization certainly helps generalization but it is NOT the fundamental reason for generalization. 있어도 잘만 (overfitting)되는데?
  • 23. Finite sample expressivity Much effort has gone into studying the expressivity of NNs However, almost all of these results are at the “population level” Showing what functions of the entire domain can and cannot be represented by certain classes of NNs with the same number of parameters What is more relevant in practice is the expressive power of NNs on a finite sample size of n
  • 24. Finite sample expressivity The expressive power of NNs on a finite sample size of n? “As soon as the number of parameters p of a networks is greater than n, even simple two-layer neural networks can represent any function of the input sample.” "(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는 d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
  • 25. Finite sample expressivity Proof) • are invertible if and only if the diagonal elements are nonzero • have their eigenvalues taken directly from the diagonal elements Lower triangular matrix… ∎ ∃𝐀𝐀−𝟏𝟏 ∵ 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝐀𝐀 = 𝒏𝒏
  • 26. Finite sample expressivity Proof) "(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는 d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
  • 27. Finite sample expressivity Proof) "(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는 d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다."
  • 28. Finite sample expressivity Proof) "(2n + d)의 weight를 가지고, 활성화 함수로 ReLU를 사용하는 2층 뉴럴 네트워크는 d차원의 n개의 샘플에 대한 어떠한 함수든지 표현할 수 있다." 𝒚𝒚 = 𝐀𝐀𝒘𝒘, ∃𝐀𝐀−𝟏𝟏 ∵ 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝐀𝐀 = 𝒏𝒏 𝒃𝒃𝒃𝒃 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳 𝟏𝟏
  • 29. Implicit regularization Neural net이 잘 되는 이유는 잘 모르겠지만 linear model 로부터 어떤 insight를 얻을 수는 없을까? An appeal to linear models
  • 30. Implicit regularization An appeal to linear models But do all global minima generalize equally well? How do we determine one to the other? “Curvature”
  • 31. Implicit regularization An appeal to linear models But do all global minima generalize equally well? How do we determine one to the other? “Curvature” In linear case, curvature of all optimal solutions is the same ∵ Hessian of the loss function is not a function of the choice of 𝑤𝑤
  • 32. Implicit regularization Algorithm 자체가 어떤 constraint Here, implicit regularization = SGD algorithm SGD는 solution이 Minimum Norm Solution으로 수렴 If curvature doesn’t distinguish global minima, what does?
  • 33. Algorithm 자체가 어떤 constraint Here, implicit regularization = SGD algorithm SGD는 solution이 Minimum Norm Solution으로 수렴 Implicit regularization If curvature doesn’t distinguish global minima, what does? 𝑤𝑤𝑚𝑚𝑚𝑚 = 𝐗𝐗 𝐓𝐓 𝐗𝐗𝐗𝐗 𝐓𝐓 −𝟏𝟏 𝑦𝑦 𝐗𝐗𝐗𝐗 𝐓𝐓 ∈ ℝ𝒏𝒏×𝒏𝒏 일종의 kernel, gram matrix K
  • 34. Implicit regularization Quite surprisingly… 이렇게 단순한 방식으로 구한 solution이 err가 매우 낮다!
  • 35. Implicit regularization Quite surprisingly… 이렇게 단순한 방식으로 구한 solution이 err가 매우 낮다! RBF kernel 𝐗𝐗𝐗𝐗 𝐓𝐓 𝜶𝜶 = 𝒚𝒚
  • 36. Implicit regularization Quite surprisingly… 이렇게 단순한 방식으로 구한 solution이 err가 매우 낮다!
  • 37. • Simple experimental framework for understanding the effective capacity of deep learning models • Successful DeepNets are able to overfit the training set • Other formal measure of complexity for the models/ algorithms/data distributions are needed to precisely explain the over-parameterized regime Conclusion
  • 38. We believe that … “understanding neural networks requires rethinking generalization.” Conclusion
  • 39. References • https://arxiv.org/pdf/1611.03530.pdf (paper) • https://openreview.net/forum?id=Sy8gdB9xx (open review comments) • https://github.com/pluskid/fitting-random-labels (code) • http://pluskid.org/slides/ICLR2017-Poster.pdf (poster) • https://www.youtube.com/watch?v=kCj51pTQPKI (presentation, YouTube) • https://www.slideshare.net/JungHoonSeo2/understanding-deep-learning-requires- rethinking-generalization-2017-12 (slideshare: Kor) • https://danieltakeshi.github.io/2017/05/19/understanding-deep-learning-requires- rethinking-generalization-my-thoughts-and-notes (blog)
  • 40. Things to discuss about… • The effective capacity of neural networks is sufficient for memorizing the entire data set. • Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels • Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged Summary 정말???
  • 41.
  • 42. 기존의 일반적인 supervised learning setting: Training과 test의 domain이 같다고 가정. Statistical Learning Theory : SVM, …
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. 전자기기 고객평가 (X) / 긍정 혹은 부정 라벨 (Y)
  • 48. 전자기기 고객평가 (X) / 긍정 혹은 부정 라벨 (Y) 비디오 게임 고객평가 (X)
  • 49. 전자기기 고객평가 (X) / 긍정 혹은 부정 라벨 (Y) 비디오 게임 고객평가 (X) NN으로 표현되는 H 함수 공간으로부터….
  • 50. 전자기기 고객평가 (X) / 긍정 혹은 부정 라벨 (Y) 비디오 게임 고객평가 (X) Classifier h를 학습하는데, target의 label을 모르지만 source(X,Y)와 target(X) 두 도메인 모두에서 잘 label 을 찾는 h를 찾고 싶다. NN으로 표현되는 H 함수 공간으로부터….
  • 51. 기존 전략: 최대한 적은 parameter를 갖으면서 training error가 최소인 model을 찾자
  • 52. 이제는 training domain (source)과 testing domain (target)이 서로 다르다 기존의 전략 외에 다른 전략이 추가로 필요하다.
  • 53. A Computable Adaptation Bound Divergence estimation complexity Dependent on number of unlabeled samples
  • 54. The optimal joint hypothesis is the hypothesis with minimal combined error is that error