3. Online Customer
MachineLearning exploresthe study and constructionof algorithmsthat can learn fromand make predictions on data.
SupervisedLearning
Regressionand
classification problemsare
mainly solved
Labelleddata is used for
training
Linear Regression,Support
Vector Machines (SVM),
NeuralNetworks, Decision
Trees,Naive Bayes, Nearest
Neighbor
UnsupervisedLearning
It is used for Clustering
problems(grouping),
Anomaly Detection (inbanks
forunusual transactions
Unlabeleddata is used
k-means
clustering, Association rule
Used inDescriptive Modelling
Semi-supervised
It is in-betweenthat
Supervisedand Unsupervised
Learning
ReinforcedLearning
machine learnsfrom past
experience
modelled as MarkovDecision
Process
Q-Learning,Deep Adversarial
Networks.
Healthcare
Finance
Retail
Travel
MediaVirtual Personal Assistants
Videos
Surveillance
Social Media Services
Malware FilteringResult Refining Product Recommendations
Online Fraud Detection
Web Search Engine
Photo tagging Applications
Spam Detector Marketing and Sales
GovernmentTransportatio
n
5. Decision Trees
Use text collections
and structured knowledge bases
Gradient Boosting Machines
• based on ensembling weak prediction
models
Kernel Methods
• group of classification algorithms
• support vector machine (SVM)
Random Forests
• involves building a large number of
specialized decision
trees and then ensembling their
outputs
Neural Networks
question answering problems
based on different types of
resources, including Web, tables ,
images, diagrams , videos
Probabilistic modeling
• the earliest forms of machine learning
• Naive Bayes algorithm
• Logistic regression
6. • Advancement in speech recognition in the last 3 years
• Advancement in Computer Vision
• Advancement in Natural Language Processing
6
7. Architecture of DNN
https://cdn-images-1.medium.com/max/800/1*5egrX--WuyrLA7gBEXdg5A.png
DNN usesa cascade of manylayersof non-linear processing unitsthat are usedfor featureextractionand transformation
• features have been learned using multiple levels of
representation
• Multi-layers of DNN helps the machine to derive the
hierarchical representation
• DL can be applied to a supervised as well as unsupervised dataset to
develop NLP applications
8. Basic layered ANN configuration
https://cn.bing.com/images/search?view=detailV2&ccid=1%2FTR7%2Ft2&id=E83FB45C81BE2EFB2C6EA8FAF5F66A7328B7DA6
6&thid=OIP.1_TR7_t2HMjc4nxdR6KNuQHaD9&mediaurl=http%3A%2F%2Fwww.scielo.org.co%2Fimg%2Frevistas%2Fiei%2Fv34
n2%2Fv34n2a03f2.jpg&exph=271&expw=507&q=ann+layers+weight&simid=608029997680099912&selectedindex=61&vt=0
9. • The loss function measures the quality of the network’s output.
• The loss score is used as a feedback signal to adjust the weights.
• Initially, the weights of the network are assigned
random values
10. Gradient descent
• Gradient descent is use to optimize the accuracy of the linear regression and minimize the loss or error function over time.
12. Activation functions
Activation functions map input nodes to the output nodes in a certain fashion
using certain mathematical operations .
ANN Structutre
Architecture
(arrangement of neurons and layers)
Activities
(activities of the neurons)
Learningrule
(Updateweightandoptimizeoutput)
• Transfer potential aggregates inputs and weights
• Activationfunction applies a non-linear mathematical transformation on transfert
potential function
• Thresholdfunction either activates the neuron or does not activate
13. Activation functions
Sigmoid function equation
https://cdn-images-1.medium.com/max/800/1*QHPXkxGmIyxn7mH4BtRJXQ.png
• Takes the given equation and a number then squash this number in the range of 0 and 1
Problems
Suffers from Vanishing gradient
problem
The gradient of the output of the network with respect to the
parameters in the early layers becomes very small
has a slow convergence rate
Due to this vanishing gradient problem sigmoid activation
function converges very slowly
is not a zero-centric function sigmoid function's output range is [0,1]
hyperbolictangent function (TanH)
Tanh activation function equation
https://cdn-images-1.medium.com/max/800/1*HJhu8BO7KxkjqRRMSaz0Gw.png
• This function squashes the input region in the range of [-1 to 1]
• its output is zero-centric
TanH also suffers from the vanishing gradient problem
Sigmoid
14. RectifiedLinearUnit (ReLu)
ReLu activation function equation
https://cdn-images-1.medium.com/max/800/1*JtJaS_wPTCshSvAFlCu_Wg.png
• ReLuis simple and doesn't have any complex computation
• ReLu is less expensive compared to sigmoid and TanH
• ReLu doesn't have the vanishing gradient problem.
• some units of the neural network can be fragile and die during training
• the gradient flowing through it will always be zero from that point on
LeakyReLu
Leaky ReLu
http://wangxinliu.com/images/machine_learning/leakyrelu.png
maxout
• Generalized form of both ReLu and Leaky ReLu
• doubles the parameters of each neuron
Otheractivationfunctions
• binary step function
• identity function
• ArcTan
15. Loss functions (cost functions or error functions )
• define the error function and get the output when start to train ANN,
• compare the generated output with the expected output given as part of the training data
• calculate the gradient value of this error function
• backpropagate the error gradient in the network to update the existing weights and bias values to optimize the generated output
16. • Quadratic cost function ( meansquarederroror sumsquared error)
• Cross-entropy cost function (Bernoullinegative log likelihoodor binarycross-entropy)
• Kullback-Leibler divergence (information divergence, information gain, relative entropy, or KLIC)
• exponential cost, Hellinger distance, Generalized Kullback-Leibler divergence, and Itakura-Saito distance
Popular Loss functions
Regression tasks
Categoricaldata and classifica
tasks
Loss functions
19. Autoencoders (compression autoencoders and denoising autoencoders)
Autoencoder network architecture
• Autoencoders are used to reduce a dataset’s dimensionality.
• The output of the autoencoder network is a reconstruction of the input data in
the most efficient form.
• the output layer in an autoen‐coder has the same number of units as the input layer does
• The autoencoder learns directly from unla‐beled data
• Autoencoders rely on backpropagation to update their weights
• Autoencoders are good at powering anomaly detec‐tion systems.
20. Deep Belief Networks (DBNs)
• DBNs are composed of layers of Restricted Boltzmann Machines (RBMs)
DBN architecture
Generative Adversarial Networks (GANs)
• GANs use unsupervised learning to train two models in parallel
• RMB is a generative stochastic artificial neural network that can learn a probability
distribution over its set of inputs
21. Convolutional Neural Networks (CNNs)
• A convolutional neural network (CNN, or ConvNet) is one of the most popular algorithms for deep learning with images and video
• CNN is composed of an input layer, an output layer, and many hidden layers in between
Feature
Detection
Layers
Convolution
puts the input images through a set of convolutional filters
Pooling
simplifies the output by performing nonlinear down sampling reducing
the number of parameters that the network needs to learn about
Rectifed linear
unit (ReLU)
allows for faster and more effective training by mapping negative
values to zero and maintaining positive value
• classificationlayershave one or morelayers
• Layersproduceclassprobabilitiesor scores
• The outputof these layersproducestypicallya two-
dimensionalout‐put
• The input layer accepts three-dimensional input
generally in the form spatially
22. Recurrent Neural Networks
• Recurrent Neural Networks are in the family of feed-forward neural networks
• Recurrent Neural Networks can send information over time-steps
• Recurrent Neural Networks use the backpropagation algorithm, but with a little twist
Advantages
• Possibility of processing input of any length
• Model size not increasing with size of input
• Computation takes into account historical information
• Weights are shared across time
applications of Recurrent Neural Networks in NLP
• Language Modeling and Generating Text
• Machine Translation
• Speech Recognition
• Generating Image Descriptions
Drawbacks
• Computation being slow
• Difficulty of accessing information from a long time ago
• Cannot consider any future input for the current state
23. RNN Extensions
• Bidirectional RNNs:
• the output at time may not only depend on the previous elements in the sequence, but also
future elements
• Deep (Bidirectional) RNNs
• multiple layers per time step
• LSTM networks
• LSTMs don’t have a fundamentally different architecture from RNNs
• use a different function to compute the hidden state
• They then combine the previous state, the current memory, and the input
24. Recursive Neural Networks
• A recursive neural network is created by applying the same set of weights recursively over a structured input
• A recursive neural network have the ability to model the hierarchial structures in the training dataset
Applications of Recurrent NeuralNetworks in NLP
Recursive Neural Networks
A recurrent neural network basically unfolds over time.
• Image scene decomposition , NLP, Audio-to-text transcription
A recursive neural network is more like a hierarchical network
Vs Recurrent Neural Networks
25. Simpler models (logistic regression) don’t achieve the accuracy level your use case needs
You have complex pattern matching in images, NLP, or audio to deal with
You have high dimensionality data
You have the dimension of time in your vectors (sequences)
You have high-quality, low-dimensional data; for example, columnar data from a database export
You’re not trying to find complex patterns in image data
You’ll achieve poor results from both methods when the data is incomplete and/or of poor quality
26. • Sarita Arora , Python Natural Language Processing, SMECorner, Mumbai, India
• François Chollet , Deep Learning with Python, MANNING Shelter Island ,2018
• Richard Socher, Yoshua Bengio and Chris Manning, Deep Learning for NLP(without Magic), ACL 2012
• Dr. Joshua F. Willey, R Deep Learning Essentials, BIRMINGHAM – MUMBAI, Packt Publishing Ltd , 2016
• Josh Patterson and Adam Gibson, Deep Learning A PRACTITIONER'S APPROACH , O’Reilly Media, Inc., 2017
• Nikhil Buduma, Fundamentals of Deep Learning , O’Reilly Media, Inc., 2017
• Antonio Spadaro, AI, Machine Learning & Deep Learning: cosa cambia , PyCon Italia 2017
• Ir Dr F. Chan , Artificial Intelligence – Deep Learning and its Applications, Build4Asia Conference 2018
• https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_overview.htm
• https://data-flair.training/blogs/python-django-tutorial/
• https://www.geeksforgeeks.org/ml-machine-learning/
• https://www.google.com/search?q=machine+learning+application&source