SlideShare uma empresa Scribd logo
1 de 93
Baixar para ler offline
Auto-Encoders and PCA, a brief psychological background 
Self-taught Learning
•How do Humans Learn? And why not replicating that? 
•How do babies think? 
Long Term 
Slide 2 of 77
•“We might expect that babies would have really powerful learning mechanisms. And in fact, the baby's brain seems to be the most powerful learning computer on the planet. 
•But real computers are actually getting to be a lot better. And there's been a revolution in our understanding of machine learning recently. And it all depends on the ideas of this guy, the Reverend Thomas Bayes, who was a statistician and mathematician in the 18th century.” 
Alison Gopnik is an American professor of psychology and affiliate professor of philosophy at the University of California, Berkeley. 
How do babies think 
Slide 3 of 77
•“And essentially what Bayes did was to provide a mathematical way using probability theory to characterize, describe, the way that scientists find out about the world. 
•So what scientists do is they have a hypothesis that they think might be likely to start with. They go out and test it against the evidence. 
•The evidence makes them change that hypothesis. Then they test that new hypothesis and so on and so forth.” 
Alison Gopnik is an American professor of psychology and affiliate professor of philosophy at the University of California, Berkeley. 
How do babies think 
Slide 4 of 77
•푃휔푋∝푃푋휔∗푃(휔) 
•Posterior ∝ Likelihood * Prior 
•If this is how our brain work, why not continue in this way ! 
Bayes’ Theorem 
Slide 5 of 77
•푃휔푋∝푃푋휔∗푃(휔) 
Bayes’ Theorem – Issues 
Slide 6 of 77
•푃휔푋∝푃푋휔∗푃(휔) 
•To build the likelihood, we need tons of data (The Law of Large Numbers) 
Bayes’ Theorem – Issues 
Slide 6 of 77
•푃휔푋∝푃푋휔∗푃(휔) 
•To build the likelihood, we need tons of data (The Law of Large Numbers) 
•Not any data, labeled data ! 
Bayes’ Theorem – Issues 
Slide 6 of 77
•푃휔푋∝푃푋휔∗푃(휔) 
•To build the likelihood, we need tons of data (The Law of Large Numbers) 
•Not any data, labeled data ! 
•We need to solve for features. 
Bayes’ Theorem – Issues 
Slide 6 of 77
•푃휔푋∝푃푋휔∗푃(휔) 
•To build the likelihood, we need tons of data (The Law of Large Numbers) 
•Not any data, labeled data ! 
•We need to solve for features. 
•How should we decide on which features to use ? 
Bayes’ Theorem – Issues 
Slide 6 of 77
Vision Example 
Slide 11 of 77
Vision Example 
Slide 12 of 77
Vision Example 
Slide 13 of 77
Vision Example 
Slide 14 of 77
Vision Example 
Slide 15 of 77
Feature Representation – Vision 
Slide 16 of 77
Feature Representation – Audio 
Slide 17 of 77
Feature Representation – NLP 
Slide 18 of 77
The “One Learning Algorithm” Hypothesis 
Slide 19 of 77
The “One Learning Algorithm” Hypothesis 
Slide 20 of 77
The “One Learning Algorithm” Hypothesis 
Slide 21 of 77
On Computer Perception 
•The Adult visual system computes an incredibly complicated function of the input. 
Slide 22 of 77
On Computer Perception 
•The Adult visual system computes an incredibly complicated function of the input. 
•We can try to implement most of this incredibly complicated function (hand- engineer features) 
Slide 22 of 77
On Computer Perception 
•The Adult visual system computes an incredibly complicated function of the input. 
•We can try to implement most of this incredibly complicated function (hand- engineer features) 
•OR, we can learn this function instead. 
Slide 22 of 77
Self-taught Learning 
Slide 23 of 77
Self-taught Learning 
Slide 23 of 77
First Stage of Visual Processing – V1 
Slide 24 of 77
Feature Learning via Sparse Coding 
• , 푋(2),…, 푋(푚) (each in 푅푛∗푛 ) 
• , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: 
• 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) 
Slide 25 of 77
Feature Learning via Sparse Coding 
•Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). 
• , 푋(2),…, 푋(푚) (each in 푅푛∗푛 ) 
• , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: 
• 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) 
Slide 25 of 77
Feature Learning via Sparse Coding 
•1) 푋 (1) , 푋 (2) 푋푋 푋 (2) (2) 푋 (2) ,…, 푋 (푚) 푋푋 푋 (푚) (푚푚) 푋 (푚) (each in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ) 
•Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). 
•Input: Images 푋 ( (1) (1) , 푋(2),…, 푋(푚) (each in 푅푛∗푛 ) 
• , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: 
• 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) 
Slide 25 of 77
Feature Learning via Sparse Coding 
• Φ 1 1 Φ 1 , Φ 2 Φ Φ 2 2 Φ 2 ,…, Φ 푘 Φ Φ 푘 푘푘 Φ 푘 (also in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ), so that each input X can be approximately decomposed as: 
•1) 푋 (1) , 푋 (2) 푋푋 푋 (2) (2) 푋 (2) ,…, 푋 (푚) 푋푋 푋 (푚) (푚푚) 푋 (푚) (each in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ) 
•Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). 
•Learn: Dictionary of bases Φ 1 , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: 
• , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: 
• 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) 
Slide 25 of 77
Feature Learning via Sparse Coding 
• 푗=1 푘 푎 푗 휑 푗 푗푗=1 푗=1 푘 푎 푗 휑 푗 푘푘 푗=1 푘 푎 푗 휑 푗 푎 푗 푎푎 푎 푗 푗푗 푎 푗 휑 푗 휑휑 휑 푗 푗푗 휑 푗 푗=1 푘 푎 푗 휑 푗 , s.t. 푎 푗 푎푎 푎 푗 푗푗 푎 푗 are mostly zero (“sparse”) 
• Φ 1 1 Φ 1 , Φ 2 Φ Φ 2 2 Φ 2 ,…, Φ 푘 Φ Φ 푘 푘푘 Φ 푘 (also in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ), so that each input X can be approximately decomposed as: 
•1) 푋 (1) , 푋 (2) 푋푋 푋 (2) (2) 푋 (2) ,…, 푋 (푚) 푋푋 푋 (푚) (푚푚) 푋 (푚) (each in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ) 
•Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). 
•X ≈ 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) 
• , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: 
• 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) 
Slide 25 of 77
Feature Learning via Sparse Coding 
Slide 26 of 77
Feature Learning via Sparse Coding 
Slide 27 of 77
Sparse Coding applied to Audio 
Slide 28 of 77
Learning Features Hierarchy 
Slide 29 of 77
Learning Features Hierarchy 
Slide 30 of 77
Features Hierarchy: Trained on face images 
Slide 31 of 77
Features Hierarchy: Trained on diff. categories 
Slide 32 of 77
Applications in Machine learning 
Slide 33 of 77
Phoneme Classification (TIMIT benchmark) 
Slide 34 of 77
State-of-the-art 
Slide 35 of 77
Brain Operation Modes 
Slide 36 of 77
Brain Operation Modes 
Slide 37 of 77 
•Professor Daniel Khaneman, the Hero of Psychology. 
•Won in 2002, the Nobel Prize in economics. 
•Now he is teaching psychology in Princeton.
Brain Operation Modes 
Slide 38 of 77 
•What do you see? 
•Angry Girl.
Brain Operation Modes 
Slide 39 of 77 
•Now, What do you see? 
•Needs effort.
Slide 40 of 77 
System One 
System Two
System One 
Slide 41 of 77 
•It’s Automatic 
•Perceiving things + Skills =Answer 
•It is an intuitive process. 
•Intuition is Recognition
System One: Memory 
Slide 42 of 77
System One: Memory 
Slide 43 of 77 
•By the age of three we all learned that “Big things can’t go inside small things”.
System One: Memory 
Slide 43 of 77 
•By the age of three we all learned that “Big things can’t go inside small things”. 
•All of us have tried to save their favorite movie on the computer and we know that those two hours requires gabs of space.
System One: Memory 
Slide 44 of 77
System One: Memory 
Slide 45 of 77 
•How do we cram the vast universe of our experience in a relatively small storage compartment between our ears?
System One: Memory 
Slide 45 of 77 
•How do we cram the vast universe of our experience in a relatively small storage compartment between our ears? 
•We Cheat ! 
•Compress memories into critical thread and key features. 
•Ex: “Dinner was disappointing”, “Tough Steak”
System One: Memory 
Slide 45 of 77 
•How do we cram the vast universe of our experience in a relatively small storage compartment between our ears? 
•We Cheat ! 
•Compress memories into critical thread and key features. 
•Ex: “Dinner was disappointing”, “Tough Steak” 
•Later when we want to remember our experience, our brains reweave, and not retrieve, the scenes using the extracted features.
System One: Memory 
Slide 46 of 77 Daniel Todd Gilbert is Professor of Psychology at Harvard University. 
In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.
System One: Memory 
Slide 46 of 77 Daniel Todd Gilbert is Professor of Psychology at Harvard University. 
In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.
System One: Memory 
Slide 46 of 77 Daniel Todd Gilbert is Professor of Psychology at Harvard University. 
In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.
System One: Memory 
Slide 47 of 77 
•The no question group wasn’t asked any questions. 
•The question group was asked the following question: 
•Did another car pass by the blue car while it stopped at the Stop Sign? 
•And then they were asked to pick which set of slides did they see, the one with the yield sign or the one with the stop sign.
System One: Memory 
Slide 47 of 77 
•90% of the no question group chose the yield sign 
•80% of the question group chose the stop sign
System One: Memory 
Slide 47 of 77 
•90% of the no question group chose the yield sign 
•80% of the question group chose the stop sign 
•The general finding is: our brains compress experiences into key features and fill in details that were not actually stored. And this is the basic idea behind the auto-encoders
Sparse Auto-encoders 
Slide 48 of 77
•An Auto-encoder neural network is an unsupervised learning algorithm that applies back propagation, on a set of unlabeled training examples {푥1, 푥2, 푥4,….} where 푥푖 ∈푅푛 by setting the target values to be equal to the inputs.[6] 
•i.e. it uses 푦푖=푥푖 
•Original contributions in back propagation was made by Hinton and Hebbian in 1980s and nowadays by Hinton , Salakhutdinov, Bengio, LeCun and Erhan (2006-2010) 
Sparse Auto-encoder 
Slide 49 of 77
•Before we get further into the details of the algorithm, we need to quickly go through neural network. 
•To describe neural networks, we will begin by describing the simplest possible neural network. One that comprises a single "neuron." We will use the following diagram to denote a single neuron [5] 
Neural Network 
Single Neuron [8] 
Slide 50 of 77
•This "neuron" is a computational unit that takes as input x1,x2,x3 (and a +1 intercept term), and outputs 
• ℎ푊,푏푋=푓푊푇푥=푓( 푊푖푥푖+푏)3푖 =1 where 푓:ℜ→ℜ is called the activation function. [5] 
Neural Network 
Slide 51 of 77
•The activation function can be:[8] 
1)Sigmoid function : 푓푧= 11+exp (−푧) , output scale from [0,1] 
Sigmoid Activation Function 
Sigmoid Function [8] 
Slide 52 of 77
•2) Tanh function: : 푓푧=tanh(푧) 푒푧−푒−푧 푒푧+푒−푧 , output scale from [-1,1] 
Tanh Activation Function 
Tanh Function [8] 
Slide 53 of 77
•Neural network parameters are: 
•(W,b) = (W(1),b(1),W(2),b(2)), where we write 푊푖푗 (푙) to denote the parameter (or weight) associated with the connection between unit j in layer l, and unit i in layer l+ 1. 
•푏푖 (푙) the bias associated with unit i in layer l + 1. 
•푎푖 (푙) will denote the activation (meaning output value) of unit i in layer l. 
•Given a fixed setting of the parameters W, b, our neural network defines a hypothesis hW,b(x) that outputs a real number. 
Neural Network Model 
Slide 54 of 77
Cost Function 
Slide 55 of 77
•The auto-encoder tries to learn a function ℎ푤,푏(푥)≈푥 . In other words, it is trying an approximation to the identity function, so as to output 푥^ is similar to 푥 
•Placing constraints on the network, such as limiting the number of hidden units, or imposing a sparsity constraint on the hidden units, lead to discover interesting structure in the data, even if the number of hidden units is large. 
Auto-encoders and Sparsity 
Slide 56 of 77
•Assumption : 
1.The neurons to be inactive most of the time (a neuron to be "active" (or as "firing") if its output value is close to 1, or "inactive" if its output value is close to 0) and the activation function is sigmoid function. 
2.Recall that 푎푗 (2) denotes the activation of hidden unit 푗 in layer 2 in the auto-encoder 
3. 푎푗 (2) (x) to denote the activation of this hidden unit when the network is given a specific input 푥 
4.Let: 휌 = 1 푚 [푎푗 (2)(푥푖)] 휌 푗=휌 푚푖=1 be the average activation unit 푗 (averaged over the training set). 
•Objective: 
•We would like to (approximately) enforce the constraint: 휌푗 = 휌 where 휌 is a sparsity parameter, a small value close to zero 
Auto-encoders and Sparsity Algorithm 
Slide 57 of 77
•To achieve this, we will add an extra penalty term to our optimization objective that penalizes : 휌푗 deviating significantly from 휌. 
• 휌log 휌 휌푗 푠2 푗=1 +(1- 휌) log1−휌 1−휌푗 , here “푠2” is the number of neurons in the hidden layer, and the index 푗 is the summing over the hidden units in the network.[6] 
• It can also be written 퐾퐿(휌 || 휌푗 )푠2 푗=1 where 퐾퐿(휌 || 휌푗 ) = 휌log 휌 휌푗 +(1-휌) log1−휌 1− 휌푗 is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean 휌 and a Bernoulli random variable with mean 휌푗 . [6] 
•KL-divergence is a standard function for measuring how different two different distributions are. 
Autoencoders and Sparsity Algorithm 
Slide 58 of 77
•Kl penalty function has the following property 퐾퐿(휌 || 휌푗 ) =0 if 휌푗 =휌 and otherwise it increases monotonically as 휌푗 diverges from 휌 . 
•For example, if we plotted 퐾퐿(휌 || 휌푗 ) for a range of values 휌푗 
•(set 휌=0.2), We will see that the KL-divergence reaches its minimum 
•of 0 at 휌푗 = 휌 and approach ∞ as 휌푗 approaches 0 or 1. 
•Thus, minimizing this penalty term has the effect of causing 휌푗 
• to close to 휌 
Auto-encoders and Sparsity Algorithm –cont’d 
KL Function 
Slide 59 of 77
Sparse Auto-encoders Cost Function to minimize 
Slide 60 of 77
Gradient Checking 
Slide 61 of 77 
• 
• 
• 
•
•We implemented a sparse auto-encoder, trained with 8×8 image patches using the L-BFGS optimization algorithm 
Auto-encoder Implementation 
A random sample of 200 patches from the dataset. 
Slide 62 of 77
Auto-encoder Implementation 
Slide 63 of 77 
•We have trained it using digits from 0 to 9
AutoEncoder Visualization 
Slide 64 of 77
Auto-encoder Implementation 
Slide 65 of 77 
•We have trained it with faces.
Auto-encoder with PCA flavor 
Slide 66 of 77 
Eigen Vectors 
Percentage of Variance retained
Autoencoder Implementation 
Slide 67 of 77 
50 
100 
150 
200 
300 
350
Auto-encoder Performance 
Slide 68 of 77
In Progress Work (Future Results) 
•Given the fact of small dataset for facial features 
•We train the neural network with a random dataset in hope that the average mean would be a nice base start for the tuning phase of the neural network 
•We then fine tune with the smaller dataset of facial features 
Slide 69 of 77
Wrap up 
Slide 70 of 77
Slide 71 of 77 
[Andrew Ng]
•Twitter: 
Data - Now 
Slide 72 of 77 
•Facebook:
•Twitter: 
Data - Now 
Slide 72 of 77 
7 terabytes of Data / Day 
•Facebook:
•Twitter: 
Data - Now 
Slide 72 of 77 
7 terabytes of Data / Day 
•Facebook: 
500 terabytes of Data / Day
•NASA announced its square kilometer telescope. 
Data – Tomorrow 
Slide 73 of 77
•NASA announced its square kilometer telescope. 
Data – Tomorrow 
Slide 73 of 77 
•It will generate 700 terabyte of data every second.
•NASA announced its square kilometer telescope. 
Data – Tomorrow 
Slide 73 of 77 
•It will generate 700 terabyte of data every second. 
•It will generate data of the same size as the internet today in two days. 
•Do you know how long it is going to take Google, with all its resources, to just index data generated from this beast in a year? 3 whole months, 90 days !
Slide 74 of 77 
[Andrew Ng]
Thanks! Q? 
Slide 75 of 77

Mais conteúdo relacionado

Semelhante a Auto-Encoders and PCA, a brief psychological background

Artificail Intelligent lec-1
Artificail Intelligent lec-1Artificail Intelligent lec-1
Artificail Intelligent lec-1tjunicornfx
 
Creativity to Innovation
Creativity to Innovation Creativity to Innovation
Creativity to Innovation Mike Cardus
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfSeth Juarez
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsChitta Ranjan
 
CM20315_01_Intro_Machine_Learning_ap.pptx
CM20315_01_Intro_Machine_Learning_ap.pptxCM20315_01_Intro_Machine_Learning_ap.pptx
CM20315_01_Intro_Machine_Learning_ap.pptxIgnajavier
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
An introduction to deep learning concepts
An introduction to deep learning conceptsAn introduction to deep learning concepts
An introduction to deep learning conceptsAmazon Web Services
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networksDavid Balduzzi
 
Survey of the current trends, and the future in Natural Language Generation
Survey of the current trends, and the future in Natural Language Generation Survey of the current trends, and the future in Natural Language Generation
Survey of the current trends, and the future in Natural Language Generation Yu Sheng Su
 
Foundations of Intelligence Agents
Foundations of Intelligence AgentsFoundations of Intelligence Agents
Foundations of Intelligence Agentsmahutte
 
Machine Learning Introduction.pptx
Machine Learning Introduction.pptxMachine Learning Introduction.pptx
Machine Learning Introduction.pptxJeeva Nantham
 
Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]
Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]
Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]joseph Hernandez
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxahmadbhattim005
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxUneezaRajpoot
 
Ap think lang ss
Ap think lang ssAp think lang ss
Ap think lang ssMrAguiar
 
Diagnosing cancer with Computational Intelligence
Diagnosing cancer with Computational IntelligenceDiagnosing cancer with Computational Intelligence
Diagnosing cancer with Computational IntelligenceSimon van Dyk
 

Semelhante a Auto-Encoders and PCA, a brief psychological background (20)

Artificail Intelligent lec-1
Artificail Intelligent lec-1Artificail Intelligent lec-1
Artificail Intelligent lec-1
 
Creativity to Innovation
Creativity to Innovation Creativity to Innovation
Creativity to Innovation
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
CM20315_01_Intro_Machine_Learning_ap.pptx
CM20315_01_Intro_Machine_Learning_ap.pptxCM20315_01_Intro_Machine_Learning_ap.pptx
CM20315_01_Intro_Machine_Learning_ap.pptx
 
Triz Overview V 1 2
Triz Overview V 1 2Triz Overview V 1 2
Triz Overview V 1 2
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
GuessWhat?!
GuessWhat?!GuessWhat?!
GuessWhat?!
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
An introduction to deep learning concepts
An introduction to deep learning conceptsAn introduction to deep learning concepts
An introduction to deep learning concepts
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networks
 
Survey of the current trends, and the future in Natural Language Generation
Survey of the current trends, and the future in Natural Language Generation Survey of the current trends, and the future in Natural Language Generation
Survey of the current trends, and the future in Natural Language Generation
 
Foundations of Intelligence Agents
Foundations of Intelligence AgentsFoundations of Intelligence Agents
Foundations of Intelligence Agents
 
Machine Learning Introduction.pptx
Machine Learning Introduction.pptxMachine Learning Introduction.pptx
Machine Learning Introduction.pptx
 
Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]
Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]
Chapter 8 Psych 1 Online Stud 1200601327285900 2[1]
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptx
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptx
 
Ap think lang ss
Ap think lang ssAp think lang ss
Ap think lang ss
 
Diagnosing cancer with Computational Intelligence
Diagnosing cancer with Computational IntelligenceDiagnosing cancer with Computational Intelligence
Diagnosing cancer with Computational Intelligence
 

Último

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 

Último (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 

Auto-Encoders and PCA, a brief psychological background

  • 1. Auto-Encoders and PCA, a brief psychological background Self-taught Learning
  • 2. •How do Humans Learn? And why not replicating that? •How do babies think? Long Term Slide 2 of 77
  • 3. •“We might expect that babies would have really powerful learning mechanisms. And in fact, the baby's brain seems to be the most powerful learning computer on the planet. •But real computers are actually getting to be a lot better. And there's been a revolution in our understanding of machine learning recently. And it all depends on the ideas of this guy, the Reverend Thomas Bayes, who was a statistician and mathematician in the 18th century.” Alison Gopnik is an American professor of psychology and affiliate professor of philosophy at the University of California, Berkeley. How do babies think Slide 3 of 77
  • 4. •“And essentially what Bayes did was to provide a mathematical way using probability theory to characterize, describe, the way that scientists find out about the world. •So what scientists do is they have a hypothesis that they think might be likely to start with. They go out and test it against the evidence. •The evidence makes them change that hypothesis. Then they test that new hypothesis and so on and so forth.” Alison Gopnik is an American professor of psychology and affiliate professor of philosophy at the University of California, Berkeley. How do babies think Slide 4 of 77
  • 5. •푃휔푋∝푃푋휔∗푃(휔) •Posterior ∝ Likelihood * Prior •If this is how our brain work, why not continue in this way ! Bayes’ Theorem Slide 5 of 77
  • 7. •푃휔푋∝푃푋휔∗푃(휔) •To build the likelihood, we need tons of data (The Law of Large Numbers) Bayes’ Theorem – Issues Slide 6 of 77
  • 8. •푃휔푋∝푃푋휔∗푃(휔) •To build the likelihood, we need tons of data (The Law of Large Numbers) •Not any data, labeled data ! Bayes’ Theorem – Issues Slide 6 of 77
  • 9. •푃휔푋∝푃푋휔∗푃(휔) •To build the likelihood, we need tons of data (The Law of Large Numbers) •Not any data, labeled data ! •We need to solve for features. Bayes’ Theorem – Issues Slide 6 of 77
  • 10. •푃휔푋∝푃푋휔∗푃(휔) •To build the likelihood, we need tons of data (The Law of Large Numbers) •Not any data, labeled data ! •We need to solve for features. •How should we decide on which features to use ? Bayes’ Theorem – Issues Slide 6 of 77
  • 16. Feature Representation – Vision Slide 16 of 77
  • 17. Feature Representation – Audio Slide 17 of 77
  • 18. Feature Representation – NLP Slide 18 of 77
  • 19. The “One Learning Algorithm” Hypothesis Slide 19 of 77
  • 20. The “One Learning Algorithm” Hypothesis Slide 20 of 77
  • 21. The “One Learning Algorithm” Hypothesis Slide 21 of 77
  • 22. On Computer Perception •The Adult visual system computes an incredibly complicated function of the input. Slide 22 of 77
  • 23. On Computer Perception •The Adult visual system computes an incredibly complicated function of the input. •We can try to implement most of this incredibly complicated function (hand- engineer features) Slide 22 of 77
  • 24. On Computer Perception •The Adult visual system computes an incredibly complicated function of the input. •We can try to implement most of this incredibly complicated function (hand- engineer features) •OR, we can learn this function instead. Slide 22 of 77
  • 27. First Stage of Visual Processing – V1 Slide 24 of 77
  • 28. Feature Learning via Sparse Coding • , 푋(2),…, 푋(푚) (each in 푅푛∗푛 ) • , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: • 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) Slide 25 of 77
  • 29. Feature Learning via Sparse Coding •Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). • , 푋(2),…, 푋(푚) (each in 푅푛∗푛 ) • , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: • 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) Slide 25 of 77
  • 30. Feature Learning via Sparse Coding •1) 푋 (1) , 푋 (2) 푋푋 푋 (2) (2) 푋 (2) ,…, 푋 (푚) 푋푋 푋 (푚) (푚푚) 푋 (푚) (each in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ) •Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). •Input: Images 푋 ( (1) (1) , 푋(2),…, 푋(푚) (each in 푅푛∗푛 ) • , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: • 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) Slide 25 of 77
  • 31. Feature Learning via Sparse Coding • Φ 1 1 Φ 1 , Φ 2 Φ Φ 2 2 Φ 2 ,…, Φ 푘 Φ Φ 푘 푘푘 Φ 푘 (also in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ), so that each input X can be approximately decomposed as: •1) 푋 (1) , 푋 (2) 푋푋 푋 (2) (2) 푋 (2) ,…, 푋 (푚) 푋푋 푋 (푚) (푚푚) 푋 (푚) (each in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ) •Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). •Learn: Dictionary of bases Φ 1 , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: • , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: • 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) Slide 25 of 77
  • 32. Feature Learning via Sparse Coding • 푗=1 푘 푎 푗 휑 푗 푗푗=1 푗=1 푘 푎 푗 휑 푗 푘푘 푗=1 푘 푎 푗 휑 푗 푎 푗 푎푎 푎 푗 푗푗 푎 푗 휑 푗 휑휑 휑 푗 푗푗 휑 푗 푗=1 푘 푎 푗 휑 푗 , s.t. 푎 푗 푎푎 푎 푗 푗푗 푎 푗 are mostly zero (“sparse”) • Φ 1 1 Φ 1 , Φ 2 Φ Φ 2 2 Φ 2 ,…, Φ 푘 Φ Φ 푘 푘푘 Φ 푘 (also in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ), so that each input X can be approximately decomposed as: •1) 푋 (1) , 푋 (2) 푋푋 푋 (2) (2) 푋 (2) ,…, 푋 (푚) 푋푋 푋 (푚) (푚푚) 푋 (푚) (each in 푅 푛∗푛 푅푅 푅 푛∗푛 푛푛∗푛푛 푅 푛∗푛 ) •Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). •X ≈ 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) • , Φ2,…, Φ푘 (also in 푅푛∗푛 ), so that each input X can be approximately decomposed as: • 푎푗휑푗 푘푗 =1 , s.t. 푎푗 are mostly zero (“sparse”) Slide 25 of 77
  • 33. Feature Learning via Sparse Coding Slide 26 of 77
  • 34. Feature Learning via Sparse Coding Slide 27 of 77
  • 35. Sparse Coding applied to Audio Slide 28 of 77
  • 36. Learning Features Hierarchy Slide 29 of 77
  • 37. Learning Features Hierarchy Slide 30 of 77
  • 38. Features Hierarchy: Trained on face images Slide 31 of 77
  • 39. Features Hierarchy: Trained on diff. categories Slide 32 of 77
  • 40. Applications in Machine learning Slide 33 of 77
  • 41. Phoneme Classification (TIMIT benchmark) Slide 34 of 77
  • 43. Brain Operation Modes Slide 36 of 77
  • 44. Brain Operation Modes Slide 37 of 77 •Professor Daniel Khaneman, the Hero of Psychology. •Won in 2002, the Nobel Prize in economics. •Now he is teaching psychology in Princeton.
  • 45. Brain Operation Modes Slide 38 of 77 •What do you see? •Angry Girl.
  • 46. Brain Operation Modes Slide 39 of 77 •Now, What do you see? •Needs effort.
  • 47. Slide 40 of 77 System One System Two
  • 48. System One Slide 41 of 77 •It’s Automatic •Perceiving things + Skills =Answer •It is an intuitive process. •Intuition is Recognition
  • 49. System One: Memory Slide 42 of 77
  • 50. System One: Memory Slide 43 of 77 •By the age of three we all learned that “Big things can’t go inside small things”.
  • 51. System One: Memory Slide 43 of 77 •By the age of three we all learned that “Big things can’t go inside small things”. •All of us have tried to save their favorite movie on the computer and we know that those two hours requires gabs of space.
  • 52. System One: Memory Slide 44 of 77
  • 53. System One: Memory Slide 45 of 77 •How do we cram the vast universe of our experience in a relatively small storage compartment between our ears?
  • 54. System One: Memory Slide 45 of 77 •How do we cram the vast universe of our experience in a relatively small storage compartment between our ears? •We Cheat ! •Compress memories into critical thread and key features. •Ex: “Dinner was disappointing”, “Tough Steak”
  • 55. System One: Memory Slide 45 of 77 •How do we cram the vast universe of our experience in a relatively small storage compartment between our ears? •We Cheat ! •Compress memories into critical thread and key features. •Ex: “Dinner was disappointing”, “Tough Steak” •Later when we want to remember our experience, our brains reweave, and not retrieve, the scenes using the extracted features.
  • 56. System One: Memory Slide 46 of 77 Daniel Todd Gilbert is Professor of Psychology at Harvard University. In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.
  • 57. System One: Memory Slide 46 of 77 Daniel Todd Gilbert is Professor of Psychology at Harvard University. In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.
  • 58. System One: Memory Slide 46 of 77 Daniel Todd Gilbert is Professor of Psychology at Harvard University. In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.
  • 59. System One: Memory Slide 47 of 77 •The no question group wasn’t asked any questions. •The question group was asked the following question: •Did another car pass by the blue car while it stopped at the Stop Sign? •And then they were asked to pick which set of slides did they see, the one with the yield sign or the one with the stop sign.
  • 60. System One: Memory Slide 47 of 77 •90% of the no question group chose the yield sign •80% of the question group chose the stop sign
  • 61. System One: Memory Slide 47 of 77 •90% of the no question group chose the yield sign •80% of the question group chose the stop sign •The general finding is: our brains compress experiences into key features and fill in details that were not actually stored. And this is the basic idea behind the auto-encoders
  • 63. •An Auto-encoder neural network is an unsupervised learning algorithm that applies back propagation, on a set of unlabeled training examples {푥1, 푥2, 푥4,….} where 푥푖 ∈푅푛 by setting the target values to be equal to the inputs.[6] •i.e. it uses 푦푖=푥푖 •Original contributions in back propagation was made by Hinton and Hebbian in 1980s and nowadays by Hinton , Salakhutdinov, Bengio, LeCun and Erhan (2006-2010) Sparse Auto-encoder Slide 49 of 77
  • 64. •Before we get further into the details of the algorithm, we need to quickly go through neural network. •To describe neural networks, we will begin by describing the simplest possible neural network. One that comprises a single "neuron." We will use the following diagram to denote a single neuron [5] Neural Network Single Neuron [8] Slide 50 of 77
  • 65. •This "neuron" is a computational unit that takes as input x1,x2,x3 (and a +1 intercept term), and outputs • ℎ푊,푏푋=푓푊푇푥=푓( 푊푖푥푖+푏)3푖 =1 where 푓:ℜ→ℜ is called the activation function. [5] Neural Network Slide 51 of 77
  • 66. •The activation function can be:[8] 1)Sigmoid function : 푓푧= 11+exp (−푧) , output scale from [0,1] Sigmoid Activation Function Sigmoid Function [8] Slide 52 of 77
  • 67. •2) Tanh function: : 푓푧=tanh(푧) 푒푧−푒−푧 푒푧+푒−푧 , output scale from [-1,1] Tanh Activation Function Tanh Function [8] Slide 53 of 77
  • 68. •Neural network parameters are: •(W,b) = (W(1),b(1),W(2),b(2)), where we write 푊푖푗 (푙) to denote the parameter (or weight) associated with the connection between unit j in layer l, and unit i in layer l+ 1. •푏푖 (푙) the bias associated with unit i in layer l + 1. •푎푖 (푙) will denote the activation (meaning output value) of unit i in layer l. •Given a fixed setting of the parameters W, b, our neural network defines a hypothesis hW,b(x) that outputs a real number. Neural Network Model Slide 54 of 77
  • 70. •The auto-encoder tries to learn a function ℎ푤,푏(푥)≈푥 . In other words, it is trying an approximation to the identity function, so as to output 푥^ is similar to 푥 •Placing constraints on the network, such as limiting the number of hidden units, or imposing a sparsity constraint on the hidden units, lead to discover interesting structure in the data, even if the number of hidden units is large. Auto-encoders and Sparsity Slide 56 of 77
  • 71. •Assumption : 1.The neurons to be inactive most of the time (a neuron to be "active" (or as "firing") if its output value is close to 1, or "inactive" if its output value is close to 0) and the activation function is sigmoid function. 2.Recall that 푎푗 (2) denotes the activation of hidden unit 푗 in layer 2 in the auto-encoder 3. 푎푗 (2) (x) to denote the activation of this hidden unit when the network is given a specific input 푥 4.Let: 휌 = 1 푚 [푎푗 (2)(푥푖)] 휌 푗=휌 푚푖=1 be the average activation unit 푗 (averaged over the training set). •Objective: •We would like to (approximately) enforce the constraint: 휌푗 = 휌 where 휌 is a sparsity parameter, a small value close to zero Auto-encoders and Sparsity Algorithm Slide 57 of 77
  • 72. •To achieve this, we will add an extra penalty term to our optimization objective that penalizes : 휌푗 deviating significantly from 휌. • 휌log 휌 휌푗 푠2 푗=1 +(1- 휌) log1−휌 1−휌푗 , here “푠2” is the number of neurons in the hidden layer, and the index 푗 is the summing over the hidden units in the network.[6] • It can also be written 퐾퐿(휌 || 휌푗 )푠2 푗=1 where 퐾퐿(휌 || 휌푗 ) = 휌log 휌 휌푗 +(1-휌) log1−휌 1− 휌푗 is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean 휌 and a Bernoulli random variable with mean 휌푗 . [6] •KL-divergence is a standard function for measuring how different two different distributions are. Autoencoders and Sparsity Algorithm Slide 58 of 77
  • 73. •Kl penalty function has the following property 퐾퐿(휌 || 휌푗 ) =0 if 휌푗 =휌 and otherwise it increases monotonically as 휌푗 diverges from 휌 . •For example, if we plotted 퐾퐿(휌 || 휌푗 ) for a range of values 휌푗 •(set 휌=0.2), We will see that the KL-divergence reaches its minimum •of 0 at 휌푗 = 휌 and approach ∞ as 휌푗 approaches 0 or 1. •Thus, minimizing this penalty term has the effect of causing 휌푗 • to close to 휌 Auto-encoders and Sparsity Algorithm –cont’d KL Function Slide 59 of 77
  • 74. Sparse Auto-encoders Cost Function to minimize Slide 60 of 77
  • 75. Gradient Checking Slide 61 of 77 • • • •
  • 76. •We implemented a sparse auto-encoder, trained with 8×8 image patches using the L-BFGS optimization algorithm Auto-encoder Implementation A random sample of 200 patches from the dataset. Slide 62 of 77
  • 77. Auto-encoder Implementation Slide 63 of 77 •We have trained it using digits from 0 to 9
  • 79. Auto-encoder Implementation Slide 65 of 77 •We have trained it with faces.
  • 80. Auto-encoder with PCA flavor Slide 66 of 77 Eigen Vectors Percentage of Variance retained
  • 81. Autoencoder Implementation Slide 67 of 77 50 100 150 200 300 350
  • 83. In Progress Work (Future Results) •Given the fact of small dataset for facial features •We train the neural network with a random dataset in hope that the average mean would be a nice base start for the tuning phase of the neural network •We then fine tune with the smaller dataset of facial features Slide 69 of 77
  • 84. Wrap up Slide 70 of 77
  • 85. Slide 71 of 77 [Andrew Ng]
  • 86. •Twitter: Data - Now Slide 72 of 77 •Facebook:
  • 87. •Twitter: Data - Now Slide 72 of 77 7 terabytes of Data / Day •Facebook:
  • 88. •Twitter: Data - Now Slide 72 of 77 7 terabytes of Data / Day •Facebook: 500 terabytes of Data / Day
  • 89. •NASA announced its square kilometer telescope. Data – Tomorrow Slide 73 of 77
  • 90. •NASA announced its square kilometer telescope. Data – Tomorrow Slide 73 of 77 •It will generate 700 terabyte of data every second.
  • 91. •NASA announced its square kilometer telescope. Data – Tomorrow Slide 73 of 77 •It will generate 700 terabyte of data every second. •It will generate data of the same size as the internet today in two days. •Do you know how long it is going to take Google, with all its resources, to just index data generated from this beast in a year? 3 whole months, 90 days !
  • 92. Slide 74 of 77 [Andrew Ng]
  • 93. Thanks! Q? Slide 75 of 77