Learning to Learn by Gradient Descent by Gradient Descent

•

1 gostou•648 visualizações

Katy Lee

NIPS 2016

Tecnologia

Background
• learn:
1. a task
2. training experience
3. a performance measure
• a computer program is said to learn if its
performance at the task improves with experience.
Mitchell [Mitchell, 1993]

Background
• learning to learn:
1. a family of tasks
2. training experience for each of these tasks
3. a family of performance measures
• an algorithm is said to learn to learn if its
performance at each task improves with
experience and with the number of tasks.
Thrun, Sebastian, and Lorien Pratt, eds. Learning to learn. Springer Science & Business Media, 2012.

Background
• Frequently, tasks in machine learning can be
expressed as the problem of optimizing an
objective function deﬁned over some domain
• The goal is to ﬁnd the minimizer
• the standard approach for differentiable functions
is some form of gradient descent, resulting in a
sequence of updates

Motivation
• Most of the modern work is based around
designing update rules for speciﬁc classes of
problems, it might perform poorly on other class of
problems

Motivation
• In this work we take a different tack and instead
propose to replace hand-designed update rules
with a learned update rule

Outline
• Related work
• Main idea
• Evaluation
• Conclusion

Related Work
• C. Daniel, J. Taylor, and S. Nowozin. Learning step
size controllers for robust neural network training. In
Association for the Advancement of Artiﬁcial
Intelligence, 2016.

Outline
• Related work or naïve methods
• Main idea
• Evaluation
• Conclusion

• In this work, they proposed to replace hand-
designed update rules with a learned update rule,
which we called the optimizer(a LSTM) m, with its
own parameter
• This results in updates to the optimizee f of the form
φ
gt is the output of LSTM

How to train the optimizer
• For training the optimizer, we have an objective that
depends on the trajectory for a time horizon T
• θ the optimizee parameters
• ϕ: the optimizer parameters
• f: the function in question
m is the LSTM

Intuition of Trajectory
old trajectory
trajectory with new φ

Challenge
• too many parameters in LSTM
• solution?

Information Sharing
Between Coordinates
• global average cells(GAC) designate a subset of
the cells in each LSTM layer for communication.
their outgoing activations are averages at each
step across all coordinates.
• allowing different LSTMs to communicate with each
other

Experiment 3: systematically
changing NN architecture
LSTM train on one-hiddent-layer 20-units NN

Experiment 4 on covnet on
CIFAR-10
LSTM-sub: train on only hold out dataset

Experiment 5 on Neural Style, optimizer
train on only one style and 1800 content
image from imageNet

Conclusion
• So far the learning process is handcraft, but this
work shows how to train a NN by a NN
• generalize well on different architecture but not on
different activation function
• execution time?
• sometimes, when you are confused for long, try to
email the author(all of them). A typo can kill you.

Mais conteúdo relacionado

Mais procurados

Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo

Bayesian Model-Agnostic Meta-LearningSangwoo Mo

Gradient descent methodProf. Neeta Awasthy

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Feedforward neural networkSopheaktra YONG

Understanding cnnRucha Gole

RNN and its applicationsSungjoon Choi

Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park

Density Based ClusteringSSA KPI

Interpretable machine learning : Methods for understanding complex modelsManojit Nandi

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

Deep Learning - Convolutional Neural NetworksChristian Perone

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya

Video Transformers.pptxSangmin Woo

Inference in Bayesian Networksguestfee8698

Learning Convolutional Neural Networks for GraphsMathias Niepert

AutoencodersAshok Govindarajan

Denoising autoencoder by Harish.RHARISH R

Introduction to Visual transformers leopauly

Recurrent Neural Networks, LSTM and GRUananth

Mais procurados (20)

Deep Implicit Layers: Learning Structured Problems with Neural Networks

Bayesian Model-Agnostic Meta-Learning

Gradient descent method

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

Feedforward neural network

Understanding cnn

RNN and its applications

Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...

Density Based Clustering

Interpretable machine learning : Methods for understanding complex models

PR-231: A Simple Framework for Contrastive Learning of Visual Representations

Deep Learning - Convolutional Neural Networks

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018

Video Transformers.pptx

Inference in Bayesian Networks

Learning Convolutional Neural Networks for Graphs

Autoencoders

Denoising autoencoder by Harish.R

Introduction to Visual transformers

Recurrent Neural Networks, LSTM and GRU

Semelhante a Learning to Learn by Gradient Descent by Gradient Descent

Optimization as a model for few shot learningKaty Lee

AI_Unit-4_Learning.pptxMohammadAsim91

Presentation File of paper "Leveraging Normalization Layer in Adapters With P...dyyjkd

Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...Jeromy Anglim

Paper review: Learned Optimizers that Scale and Generalize.Wuhyun Rico Shin

Reinforcement learningDongHyun Kwak

Transfer Learning in NLP: A SurveyNUPUR YADAV

Online Tuning of Large Scale Recommendation SystemsViral Gupta

Introduction of Deep Reinforcement LearningNAVER Engineering

Reinforcement LearningDongHyun Kwak

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya

Learning how to learnJoaquin Vanschoren

Introduction to cyclical learning rates for training neural netsSayak Paul

NS-CUK Seminar: J.H.Lee, Review on "Task Relation-aware Continual User Repres...ssuser4b1f48

ngboost.pptxMohamedAliHabib3

ELLA LC algorithm presentation in ICIP 2016InVID Project

OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview

human computer Interaction cognitive models.pptJayaprasanna4

Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsJoonyoung Yi

Semelhante a Learning to Learn by Gradient Descent by Gradient Descent (20)

Optimization as a model for few shot learning

AI_Unit-4_Learning.pptx

Presentation File of paper "Leveraging Normalization Layer in Adapters With P...

Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...

Paper review: Learned Optimizers that Scale and Generalize.

Reinforcement learning

Transfer Learning in NLP: A Survey

Online Tuning of Large Scale Recommendation Systems

Introduction of Deep Reinforcement Learning

Reinforcement Learning

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...

Learning how to learn

Introduction to cyclical learning rates for training neural nets

NS-CUK Seminar: J.H.Lee, Review on "Task Relation-aware Continual User Repres...

ngboost.pptx

ELLA LC algorithm presentation in ICIP 2016

OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING

human computer Interaction cognitive models.ppt

Introduction to MAML (Model Agnostic Meta Learning) with Discussions

Mais de Katy Lee

ICML 2017 Meta networkKaty Lee

Technical interview experience sharingKaty Lee

Overcoming catastrophic forgetting in neural networkKaty Lee

Meta learning with memory augmented neural networkKaty Lee

Making neural programming architectures generalize via recursionKaty Lee

FinalReportKaty Lee

Neural_Programmer_InterpreterKaty Lee

Mais de Katy Lee (7)

ICML 2017 Meta network

Technical interview experience sharing

Overcoming catastrophic forgetting in neural network

Meta learning with memory augmented neural network

Making neural programming architectures generalize via recursion

FinalReport

Neural_Programmer_Interpreter

Último

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

unit 4 immunoblotting technique complete.pptxBkGupta21

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Learning to Learn by Gradient Descent by Gradient Descent

1. Learning to learn by gradient descent by gradient descent citation: 9 -> 38 Katy, 2016/11/25@DataLab NIPS 2016

2. Background • learn: 1. a task 2. training experience 3. a performance measure • a computer program is said to learn if its performance at the task improves with experience. Mitchell [Mitchell, 1993]

3. Background • learning to learn: 1. a family of tasks 2. training experience for each of these tasks 3. a family of performance measures • an algorithm is said to learn to learn if its performance at each task improves with experience and with the number of tasks. Thrun, Sebastian, and Lorien Pratt, eds. Learning to learn. Springer Science & Business Media, 2012.

4. Background • Frequently, tasks in machine learning can be expressed as the problem of optimizing an objective function deﬁned over some domain • The goal is to ﬁnd the minimizer • the standard approach for differentiable functions is some form of gradient descent, resulting in a sequence of updates

5. Motivation • Most of the modern work is based around designing update rules for speciﬁc classes of problems, it might perform poorly on other class of problems

6. Motivation • In this work we take a different tack and instead propose to replace hand-designed update rules with a learned update rule

7. Outline • Related work • Main idea • Evaluation • Conclusion

8. Outline • Related work • Main idea • Evaluation • Conclusion

9. Related Work • C. Daniel, J. Taylor, and S. Nowozin. Learning step size controllers for robust neural network training. In Association for the Advancement of Artiﬁcial Intelligence, 2016.

10. Outline • Related work or naïve methods • Main idea • Evaluation • Conclusion

11. Learning to learn with RNN

12. • In this work, they proposed to replace hand- designed update rules with a learned update rule, which we called the optimizer(a LSTM) m, with its own parameter • This results in updates to the optimizee f of the form φ gt is the output of LSTM

13. How to train the optimizer • For training the optimizer, we have an objective that depends on the trajectory for a time horizon T • θ the optimizee parameters • ϕ: the optimizer parameters • f: the function in question m is the LSTM

14. Intuition of Trajectory old trajectory trajectory with new φ

15.

16. Challenge • too many parameters in LSTM • solution?

17. Coordinatewise LSTM Optimizer gt

18. Information Sharing Between Coordinates • global average cells(GAC) designate a subset of the cells in each LSTM layer for communication. their outgoing activations are averages at each step across all coordinates. • allowing different LSTMs to communicate with each other

19. Outline • Related work • Main idea • Evaluation • Conclusion

20. Experiment 1

21. Experiment 2: change structure

22. Experiment 3: systematically changing NN architecture LSTM train on one-hiddent-layer 20-units NN

23. Experiment 4 on covnet on CIFAR-10 LSTM-sub: train on only hold out dataset

24. Experiment 5 on Neural Style, optimizer train on only one style and 1800 content image from imageNet

25. Outline • Related work or naïve methods • Main idea • Evaluation • Conclusion

26. Conclusion • So far the learning process is handcraft, but this work shows how to train a NN by a NN • generalize well on different architecture but not on different activation function • execution time? • sometimes, when you are confused for long, try to email the author(all of them). A typo can kill you.

Learning to Learn by Gradient Descent by Gradient Descent

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Learning to Learn by Gradient Descent by Gradient Descent

Semelhante a Learning to Learn by Gradient Descent by Gradient Descent (20)

Mais de Katy Lee

Mais de Katy Lee (7)

Último

Último (20)

Learning to Learn by Gradient Descent by Gradient Descent