SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
The Magic of Auto Differentiation
Sanyam Kapoor, sanyam@nyu.edu
October 21, 2017
Courant Institute, NYU
Relevance to Machine Learning
The Learning Problem
Given an a set of instances S : (X, Y) drawn i.i.d from some
distribution, predict the underlying unknown distribution D.
1
One Approach to Learning
We define a loss function L on some hypothesis h ∈ H (hypothesis
set) and aim to minimize the loss across sample space
argmin
θ
L(Y, h(X, θ)) (1)
where θ is the set of parameters of the hypothesis.
2
Minimizing the Loss
Differentiation is our tool! Compute the solution to
θL =
dL
dθ
= 0 (2)
And, we have solved the learning problem. But have we?
3
Techniques in Differentiation
A Sample Function
Consider a multi-variate function
f (x1, x2) = x1 log
x1
sin(x2
2 )
(3)
I just cooked that up!
4
Manual Differentiation
∂f
∂x1
= log
x1
sin(x2
2 )
+
1
2
log
x1
sin(x2
2 )
−1
2
(4)
∂f
∂x2
= −x1x2cot(x2
2 ) log
x1
sin(x2
2 )
−1
2
(5)
Pros
• Irreducable form
• Hardcodes everything
Cons
• Time Consuming
• Error Prone
5
Numerical Differentiation
Method of finite differences derived from First-Order
Approximation of Taylor Series (higher order methods as well)
[BF89]
lim
h→0
∂f
∂x
=
f (x + h) − f (x)
h
(6)
lim
h→0
∂f
∂x
=
f (x + h) − f (x − h)
2h
(7)
Pros
• Fair approximations
Cons
• Ill-conditioned and unstable
• Truncation and Round-off
errors
6
Symbolic Differentiation
Compute actual symbols from a repository of basic rules like the
sum rule or the product rule represented as concrete data
structures.
Used in algebra systems like Mathematica, Theano. A deterministic
and mechanistic process just like how one would code!
Pros
• Insight into structure of
problem
• Build analytical solutions
(e.g. the classic Normal
Equation for Linear
Regression)
Cons
• Expression Swell
7
Automatic Differentiation
Problem Calculate the sensitivity of output w.r.t input (Jacobian)
Observations
1. Need the exact derivatives and not approximations
2. Don’t really need the the symbolic form
Solution Chain rule (but just the smart way!)
8
Automatic Differentiation
Computational Graphs
Represents flow of values across a non-trivial computation [Bau74].
Core of modern computational libraries like PyTorch and
Tensorflow.
Consider each node as a special gate. Looks familiar?
Figure 1: Computational Graph for Equation 3
9
Forward Mode Differentiation
Computes the sensitivity of the output w.r.t. one input parameter.
Any hypothesis h : Rm → R would require m forward mode
differentiations to compute sensitivity w.r.t each input parameter.
Forward Primal Trace is the algebraic version of computational
graph. Read top-down.
Forward Tangent Trace calculates ∂
∂x . Read top-down.
10
Forward Mode Example
Forward Primal Trace
v−1 = x1
v0 = x2
v1 = v2
0
v2 = sin(v1)
v3 =
v−1
v2
v4 = log(v3)
v5 =
√
v4
v6 = v−1 ∗ v5
y = v6
Forward Tangent Trace ∂
∂x2
˙v−1 = 0
˙v0 = 1
˙v1 = 2v0 ˙v0
˙v2 = cos(v1) ˙v1
˙v3 =
˙v−1v2 − v−1 ˙v2
v2
2
˙v4 =
1
v3
˙v3
˙v5 = −
1
2
v
−1
2
4 ˙v4
˙y = ˙v6 = ˙v−1v5 + v−1 ˙v5
11
Reverse Mode Differentiation
Computes the sensitivity of the output w.r.t. all input parameters.
Any hypothesis h : Rm → R would require ONE reverse mode
differentiation.
Also called Reverse Mode Accumulator.
¯vi =
∂f
∂vi
(adjoint of a variable)
Reverse Adjoint Trace calculates ∂f
∂ . Read bottom-up.
12
Reverse Mode Example
Forward Primal Trace
v−1 = x1
v0 = x2
v1 = v2
0
v2 = sin(v1)
v3 =
v−1
v2
v4 = log(v3)
v5 =
√
v4
v6 = v−1 ∗ v5
y = v6
Reverse Adjoint Trace ∂f
∂
¯v0 = ¯v1
∂v1
∂v0
¯v1 = ¯v2
∂v2
∂v1
¯v−1 = ¯v−1 + ¯v3
∂v3
∂v−1
; ¯v2 = ¯v3
∂v3
∂v2
¯v3 = ¯v4
∂v4
∂v3
¯v4 = ¯v5
∂v5
∂v4
¯v−1 = ¯v6
∂v6
∂v−1
; ¯v5 = ¯v6
∂v6
∂v5
¯y = ¯v6 = 1
13
Reverse Mode in Practice
More commonly known as the Backpropagation algorithm.
For a generic hypothesis h : Rm → Rn, we need n reverse mode
differentiations versus m forward mode differentiations. Helpful
when n m.
For instance, Dense Interpolated Embedding Model DIEM
[TGR15] proposed an architecture with ∼ 160b parameters and
output syntactic embeddings of size 1000+.
14
Reverse Mode in PyTorch i
import torch
from torch . autograd import V a r i a b l e
def main ( ) :
N, D in , H, D out = 64 , 1000 , 100 , 10
x = V a r i a b l e ( torch . randn (N, D in ))
y = V a r i a b l e ( torch . randn (N, D out ))
model = torch . nn . S e q u e n t i a l (
torch . nn . Linear ( D in , H) ,
torch . nn . ReLU () ,
torch . nn . Linear (H, D out ) ,
15
Reverse Mode in PyTorch ii
)
l o s s f n = torch . nn . MSELoss ( s i z e a v e r a g e=False )
o p t i m i z e r = torch . optim .SGD(
model . parameters () ,
l r =1e−4)
for t in range (500):
y pred = model ( x )
l o s s = l o s s f n ( y pred , y )
o p t i m i z e r . zero grad ()
l o s s . backward () # Reverse Mode !
o p t i m i z e r . step ()
16
References i
F. L. Bauer, Computational graphs and rounding error, SIAM
Journal on Numerical Analysis 11 (1974), no. 1, 87–96.
Richard L. Burden and J. Douglas Faires, Numerical analysis:
4th ed, PWS Publishing Co., Boston, MA, USA, 1989.
Atilim Gunes Baydin, Barak A. Pearlmutter, and
Alexey Andreyevich Radul, Automatic differentiation in
machine learning: a survey, CoRR abs/1502.05767 (2015).
Andreas Griewank and Andrea Walther, Evaluating derivatives:
Principles and techniques of algorithmic differentiation, second
ed., Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 2008.
17
References ii
Andrew Trask, David Gilmore, and Matthew Russell, Modeling
order in neural word embeddings at scale, Proceedings of the
32nd International Conference on Machine Learning
(ICML-15) (David Blei and Francis Bach, eds.), JMLR
Workshop and Conference Proceedings, 2015, pp. 2266–2275.
18

Mais conteúdo relacionado

Mais procurados

Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoostJoonyoung Yi
 
Quora questions pair duplication analysis using semantic analysis
Quora questions pair duplication analysis using semantic analysisQuora questions pair duplication analysis using semantic analysis
Quora questions pair duplication analysis using semantic analysisAkshata Talankar
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithmsAboul Ella Hassanien
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with baggingChode Amarnath
 
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...Ajay Kumar
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
The Mathematics of Neural Networks
The Mathematics of Neural NetworksThe Mathematics of Neural Networks
The Mathematics of Neural Networksm.a.kirn
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
support vector regression
support vector regressionsupport vector regression
support vector regressionAkhilesh Joshi
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Gaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine LearningGaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine Learningbutest
 
Collaborative writing technologies: Overleaf for institutions
Collaborative writing technologies: Overleaf for institutionsCollaborative writing technologies: Overleaf for institutions
Collaborative writing technologies: Overleaf for institutionsDigital Science
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro9xdot
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learningVARUN KUMAR
 
Statistiques descriptives [PDF].pptx
Statistiques descriptives [PDF].pptxStatistiques descriptives [PDF].pptx
Statistiques descriptives [PDF].pptxTarekDHAHRI1
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionMohammad Ilyas Malik
 

Mais procurados (20)

Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Quora questions pair duplication analysis using semantic analysis
Quora questions pair duplication analysis using semantic analysisQuora questions pair duplication analysis using semantic analysis
Quora questions pair duplication analysis using semantic analysis
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithms
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
The Mathematics of Neural Networks
The Mathematics of Neural NetworksThe Mathematics of Neural Networks
The Mathematics of Neural Networks
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
support vector regression
support vector regressionsupport vector regression
support vector regression
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Gaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine LearningGaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine Learning
 
PSO.ppt
PSO.pptPSO.ppt
PSO.ppt
 
Collaborative writing technologies: Overleaf for institutions
Collaborative writing technologies: Overleaf for institutionsCollaborative writing technologies: Overleaf for institutions
Collaborative writing technologies: Overleaf for institutions
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learning
 
Statistiques descriptives [PDF].pptx
Statistiques descriptives [PDF].pptxStatistiques descriptives [PDF].pptx
Statistiques descriptives [PDF].pptx
 
Midtveispresentasjon 2016
Midtveispresentasjon 2016Midtveispresentasjon 2016
Midtveispresentasjon 2016
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 

Semelhante a The Magic of Auto Differentiation

Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...Jialin LIU
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiouvafopoulos
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformAmr E. Mohamed
 
INTRODUCTION TO MATLAB presentation.pptx
INTRODUCTION TO MATLAB presentation.pptxINTRODUCTION TO MATLAB presentation.pptx
INTRODUCTION TO MATLAB presentation.pptxDevaraj Chilakala
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Languagevsssuresh
 
Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Vissarion Fisikopoulos
 
An application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integrationAn application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integrationHidenoriOgata
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 

Semelhante a The Magic of Auto Differentiation (20)

MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic Notation
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-Transform
 
INTRODUCTION TO MATLAB presentation.pptx
INTRODUCTION TO MATLAB presentation.pptxINTRODUCTION TO MATLAB presentation.pptx
INTRODUCTION TO MATLAB presentation.pptx
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
1. linear model, inference, prediction
1. linear model, inference, prediction1. linear model, inference, prediction
1. linear model, inference, prediction
 
5 numerical analysis
5 numerical analysis5 numerical analysis
5 numerical analysis
 
Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.
 
An application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integrationAn application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integration
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 

Último

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

The Magic of Auto Differentiation

  • 1. The Magic of Auto Differentiation Sanyam Kapoor, sanyam@nyu.edu October 21, 2017 Courant Institute, NYU
  • 3. The Learning Problem Given an a set of instances S : (X, Y) drawn i.i.d from some distribution, predict the underlying unknown distribution D. 1
  • 4. One Approach to Learning We define a loss function L on some hypothesis h ∈ H (hypothesis set) and aim to minimize the loss across sample space argmin θ L(Y, h(X, θ)) (1) where θ is the set of parameters of the hypothesis. 2
  • 5. Minimizing the Loss Differentiation is our tool! Compute the solution to θL = dL dθ = 0 (2) And, we have solved the learning problem. But have we? 3
  • 7. A Sample Function Consider a multi-variate function f (x1, x2) = x1 log x1 sin(x2 2 ) (3) I just cooked that up! 4
  • 8. Manual Differentiation ∂f ∂x1 = log x1 sin(x2 2 ) + 1 2 log x1 sin(x2 2 ) −1 2 (4) ∂f ∂x2 = −x1x2cot(x2 2 ) log x1 sin(x2 2 ) −1 2 (5) Pros • Irreducable form • Hardcodes everything Cons • Time Consuming • Error Prone 5
  • 9. Numerical Differentiation Method of finite differences derived from First-Order Approximation of Taylor Series (higher order methods as well) [BF89] lim h→0 ∂f ∂x = f (x + h) − f (x) h (6) lim h→0 ∂f ∂x = f (x + h) − f (x − h) 2h (7) Pros • Fair approximations Cons • Ill-conditioned and unstable • Truncation and Round-off errors 6
  • 10. Symbolic Differentiation Compute actual symbols from a repository of basic rules like the sum rule or the product rule represented as concrete data structures. Used in algebra systems like Mathematica, Theano. A deterministic and mechanistic process just like how one would code! Pros • Insight into structure of problem • Build analytical solutions (e.g. the classic Normal Equation for Linear Regression) Cons • Expression Swell 7
  • 11. Automatic Differentiation Problem Calculate the sensitivity of output w.r.t input (Jacobian) Observations 1. Need the exact derivatives and not approximations 2. Don’t really need the the symbolic form Solution Chain rule (but just the smart way!) 8
  • 13. Computational Graphs Represents flow of values across a non-trivial computation [Bau74]. Core of modern computational libraries like PyTorch and Tensorflow. Consider each node as a special gate. Looks familiar? Figure 1: Computational Graph for Equation 3 9
  • 14. Forward Mode Differentiation Computes the sensitivity of the output w.r.t. one input parameter. Any hypothesis h : Rm → R would require m forward mode differentiations to compute sensitivity w.r.t each input parameter. Forward Primal Trace is the algebraic version of computational graph. Read top-down. Forward Tangent Trace calculates ∂ ∂x . Read top-down. 10
  • 15. Forward Mode Example Forward Primal Trace v−1 = x1 v0 = x2 v1 = v2 0 v2 = sin(v1) v3 = v−1 v2 v4 = log(v3) v5 = √ v4 v6 = v−1 ∗ v5 y = v6 Forward Tangent Trace ∂ ∂x2 ˙v−1 = 0 ˙v0 = 1 ˙v1 = 2v0 ˙v0 ˙v2 = cos(v1) ˙v1 ˙v3 = ˙v−1v2 − v−1 ˙v2 v2 2 ˙v4 = 1 v3 ˙v3 ˙v5 = − 1 2 v −1 2 4 ˙v4 ˙y = ˙v6 = ˙v−1v5 + v−1 ˙v5 11
  • 16. Reverse Mode Differentiation Computes the sensitivity of the output w.r.t. all input parameters. Any hypothesis h : Rm → R would require ONE reverse mode differentiation. Also called Reverse Mode Accumulator. ¯vi = ∂f ∂vi (adjoint of a variable) Reverse Adjoint Trace calculates ∂f ∂ . Read bottom-up. 12
  • 17. Reverse Mode Example Forward Primal Trace v−1 = x1 v0 = x2 v1 = v2 0 v2 = sin(v1) v3 = v−1 v2 v4 = log(v3) v5 = √ v4 v6 = v−1 ∗ v5 y = v6 Reverse Adjoint Trace ∂f ∂ ¯v0 = ¯v1 ∂v1 ∂v0 ¯v1 = ¯v2 ∂v2 ∂v1 ¯v−1 = ¯v−1 + ¯v3 ∂v3 ∂v−1 ; ¯v2 = ¯v3 ∂v3 ∂v2 ¯v3 = ¯v4 ∂v4 ∂v3 ¯v4 = ¯v5 ∂v5 ∂v4 ¯v−1 = ¯v6 ∂v6 ∂v−1 ; ¯v5 = ¯v6 ∂v6 ∂v5 ¯y = ¯v6 = 1 13
  • 18. Reverse Mode in Practice More commonly known as the Backpropagation algorithm. For a generic hypothesis h : Rm → Rn, we need n reverse mode differentiations versus m forward mode differentiations. Helpful when n m. For instance, Dense Interpolated Embedding Model DIEM [TGR15] proposed an architecture with ∼ 160b parameters and output syntactic embeddings of size 1000+. 14
  • 19. Reverse Mode in PyTorch i import torch from torch . autograd import V a r i a b l e def main ( ) : N, D in , H, D out = 64 , 1000 , 100 , 10 x = V a r i a b l e ( torch . randn (N, D in )) y = V a r i a b l e ( torch . randn (N, D out )) model = torch . nn . S e q u e n t i a l ( torch . nn . Linear ( D in , H) , torch . nn . ReLU () , torch . nn . Linear (H, D out ) , 15
  • 20. Reverse Mode in PyTorch ii ) l o s s f n = torch . nn . MSELoss ( s i z e a v e r a g e=False ) o p t i m i z e r = torch . optim .SGD( model . parameters () , l r =1e−4) for t in range (500): y pred = model ( x ) l o s s = l o s s f n ( y pred , y ) o p t i m i z e r . zero grad () l o s s . backward () # Reverse Mode ! o p t i m i z e r . step () 16
  • 21. References i F. L. Bauer, Computational graphs and rounding error, SIAM Journal on Numerical Analysis 11 (1974), no. 1, 87–96. Richard L. Burden and J. Douglas Faires, Numerical analysis: 4th ed, PWS Publishing Co., Boston, MA, USA, 1989. Atilim Gunes Baydin, Barak A. Pearlmutter, and Alexey Andreyevich Radul, Automatic differentiation in machine learning: a survey, CoRR abs/1502.05767 (2015). Andreas Griewank and Andrea Walther, Evaluating derivatives: Principles and techniques of algorithmic differentiation, second ed., Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008. 17
  • 22. References ii Andrew Trask, David Gilmore, and Matthew Russell, Modeling order in neural word embeddings at scale, Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (David Blei and Francis Bach, eds.), JMLR Workshop and Conference Proceedings, 2015, pp. 2266–2275. 18