This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks, followed by a TypeScript-based code sample that replicates the Tensorflow playground. Basic knowledge of matrices is helpful for this session.
7. AI/ML/DL: How They Differ
Traditional AI (20th century):
based on collections of rules
Led to expert systems in the 1980s
The era of LISP and Prolog
8. AI/ML/DL: How They Differ
Machine Learning:
Started in the 1950s (approximate)
Alan Turing and “learning machines”
Data-driven (not rule-based)
Many types of algorithms
Involves optimization
9. AI/ML/DL: How They Differ
Deep Learning:
Started in the 1950s (approximate)
The “perceptron” (basis of NNs)
Data-driven (not rule-based)
large (even massive) data sets
Involves neural networks (CNNs: ~1970s)
Lots of heuristics
Heavily based on empirical results
10. The Rise of Deep Learning
Massive and inexpensive computing power
Huge volumes of data/Powerful algorithms
The “big bang” in 2009:
”deep-learning neural networks and NVidia GPUs"
Google Brain used NVidia GPUs (2009)
11. AI/ML/DL: Commonality
All of them involve a model
A model represents a system
Goal: a good predictive model
The model is based on:
Many rules (for AI)
data and algorithms (for ML)
large sets of data (for DL)
12. A Basic Model in Machine Learning
Let’s perform the following steps:
1) Start with a simple model (2 variables)
2) Generalize that model (n variables)
3) See how it might apply to a NN
13. Linear Regression
One of the simplest models in ML
Fits a line (y = m*x + b) to data in 2D
Finds best line by minimizing MSE:
m = average of x values (“mean”)
b also has a closed form solution
16. Linear Regression: example #1
One feature (independent variable):
X = number of square feet
Predicted value (dependent variable):
Y = cost of a house
A very “coarse grained” model
We can devise a much better model
17. Linear Regression: example #2
Multiple features:
X1 = # of square feet
X2 = # of bedrooms
X3 = # of bathrooms (dependency?)
X4 = age of house
X5 = cost of nearby houses
X6 = corner lot (or not): Boolean
a much better model (6 features)
18. Linear Multivariate Analysis
General form of multivariate equation:
Y = w1*x1 + w2*x2 + . . . + wn*xn + b
w1, w2, . . . , wn are numeric values
x1, x2, . . . , xn are variables (features)
Properties of variables:
Can be independent (Naïve Bayes)
weak/strong dependencies can exist
20. Neural Networks: equations
Node “values” in first hidden layer:
N1 = w11*x1+w21*x2+…+wn1*xn
N2 = w12*x1+w22*x2+…+wn2*xn
N3 = w13*x1+w23*x2+…+wn3*xn
. . .
Nn = w1n*x1+w2n*x2+…+wnn*xn
Similar equations for other pairs of layers
21. Neural Networks: Matrices
From inputs to first hidden layer:
Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix)
From first to second hidden layers:
Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix)
From second to third hidden layers:
Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)
Apply an “activation function” to y values
22. Neural Networks (general)
Multiple hidden layers:
Layer composition is your decision
Activation functions: sigmoid, tanh, RELU
https://en.wikipedia.org/wiki/Activation_function
Back propagation (1980s)
https://en.wikipedia.org/wiki/Backpropagation
=> Initial weights: small random numbers
29. What’s the “Best” Activation Function?
Initially: sigmoid was popular
Then: tanh became popular
Now: RELU is preferred (better results)
Softmax: for FC (fully connected) layers
NB: sigmoid + tanh are used in LSTMs
30. Even More Activation Functions!
https://stats.stackexchange.com/questions/11525
8/comprehensive-list-of-activation-functions-in-
neural-networks-with-pros-cons
https://medium.com/towards-data-
science/activation-functions-and-its-types-which-
is-better-a9a5310cc8f
https://medium.com/towards-data-science/multi-
layer-neural-networks-with-sigmoid-function-
deep-learning-for-rookies-2-bf464f09eb7f
34. How to Select a Cost Function
1) Depends on the learning type:
=> supervised versus unsupervised/RL
2) Depends on the activation function
3) Other factors
Example:
cross-entropy cost function for supervised
learning on multiclass classification
35. GD versus SGD
SGD (Stochastic Gradient Descent):
+ involves a SUBSET of the dataset
+ aka Minibatch Stochastic Gradient Descent
GD (Gradient Descent):
+ involves the ENTIRE dataset
More details:
http://cs229.stanford.edu/notes/cs229-notes1.pdf
36. Setting up Data & the Model
Normalize the data:
Subtract the ‘mean’ and divide by stddev
[Central Limit Theorem]
Initial weight values for NNs:
Random numbers between -1 and 1
More details:
http://cs231n.github.io/neural-networks-2/#losses
37. What are Hyper Parameters?
higher level concepts about the model such as
complexity, or capacity to learn
Cannot be learned directly from the data in the
standard model training process
must be predefined
38. Hyper Parameters (examples)
# of hidden layers in a neural network
the learning rate (in many models)
the dropout rate
# of leaves or depth of a tree
# of latent factors in a matrix factorization
# of clusters in a k-means clustering
39. Hyper Parameter: dropout rate
"dropout" refers to dropping out units (both hidden
and visible) in a neural network
a regularization technique for reducing overfitting in
neural networks
prevents complex co-adaptations on training data
a very efficient way of performing model averaging
with neural networks
40. How Many Layers in a DNN?
Algorithm #1 (from Geoffrey Hinton):
1) add layers until you start overfitting your
training set
2) now add dropout or some another
regularization method
Algorithm #2 (Yoshua Bengio):
"Add layers until the test error does not improve
anymore.”
41. How Many Hidden Nodes in a DNN?
Based on a relationship between:
# of input and # of output nodes
Amount of training data available
Complexity of the cost function
The training algorithm
42. CNNs versus RNNs
CNNs (Convolutional NNs):
Good for image processing
2000: CNNs processed 10-20% of all checks
=> Approximately 60% of all NNs
RNNs (Recurrent NNs):
Good for NLP and audio
51. GANs: Generative Adversarial Networks
Make imperceptible changes to images
Can consistently defeat all NNs
Can have extremely high error rate
Some images create optical illusions
https://www.quora.com/What-are-the-pros-and-cons-
of-using-generative-adversarial-networks-a-type-of-
neural-network
52. What is TypeScript?
A superset of JavaScript (ES6): 10/01/2012
A compiled language (tsc compiler)
strong typing and also type inferencing
Type checking during compile time
“minimal” extra compile time overhead
“.ts” files are transpiled into “.js” files (via tsc)
“lib.d.ts” contains TypeScript definitions
53. What is TypeScript?
Optional type-checking system
Interfaces, classes, and constructors
Open source project (github)
Used very heavily in Angular
Supported by ReactJS and VueJS (some?)
=> see demo
54. Activations in TypeScript (nn.ts)
export class Activations {
public static TANH: ActivationFunction = {
output: x => (Math as any).tanh(x),
der: x => {
let output = Activations.TANH.output(x);
return 1 - output * output;
}
};
public static RELU: ActivationFunction = {
output: x => Math.max(0, x),
der: x => x <= 0 ? 0 : 1
};
55. Activations in TypeScript (nn.ts)
public static SIGMOID: ActivationFunction = {
output: x => 1 / (1 + Math.exp(-x)),
der: x => {
let output = Activations.SIGMOID.output(x);
return output * (1 - output);
}
};
public static LINEAR: ActivationFunction = {
output: x => x,
der: x => 1
};
}
56. Deep Learning Playground
TF playground home page:
http://playground.tensorflow.org
Demo #1:
https://github.com/tadashi-aikawa/typescript-
playground
Converts playground to TypeScript
57. Deep Learning and Art
“Convolutional Blending” images:
=> 19-layer Convolutional Neural Network
www.deepart.io
Bots created their own language:
https://www.recode.net/2017/3/23/14962182/ai-
learning-language-open-ai-research
https://www.fastcodesign.com/90124942/this-
google-engineer-taught-an-algorithm-to-make-
train-footage-and-its-hypnotic
59. About Me
I provide training for the following:
=> Deep Learning/TensorFlow/Keras
=> Android
=> Angular 4
60. Recent/Upcoming Books
1) HTML5 Canvas and CSS3 Graphics (2013)
2) jQuery, CSS3, and HTML5 for Mobile (2013)
3) HTML5 Pocket Primer (2013)
4) jQuery Pocket Primer (2013)
5) HTML5 Mobile Pocket Primer (2014)
6) D3 Pocket Primer (2015)
7) Python Pocket Primer (2015)
8) SVG Pocket Primer (2016)
9) CSS3 Pocket Primer (2016)
10) Android Pocket Primer (2017)
11) Angular Pocket Primer (2017)