SlideShare a Scribd company logo
1 of 36
Deep Learning and Watson Studio
Sasha Lazarevic, IBM Switzerland
https://www.linkedin.com/in/lzrvc/
LZRVC.com
Two Reasons why we want to speak
to you about Deep Learning
It is used for planet discoveries
Source: NASA.gov
Traditional techniques like:
- Robovetter,
- Random Forests,
- Probabilistic methods,
- PCA.
Since December 2017:
- CNN
.. for CERN experiments
2012 : Discovery of Higgs-Boson. Higgs boson identification involve the use of jet pull features, which is a class of
features used to characterize the superstructure of a particle decay event. The scientists were using Fisher Discriminant
Analysis (FDA) for classification, gradient boosting, random forests ML and many other methods.
2014: Challenge 2014 on Kaggle on tau-tau decays of Higgs Bosons. Winning solution (out of 1700 teams) was using an
ensemble of 70 dropout neural networks, where each neural network had three hidden layers of 600 neurons:
https://github.com/melisgl/higgsml
2016: Distributed Deep Learning: Cern created Dist-Keras and several new DDL optimization algorithms
2018: “Deep Learning could potentially help in data reconstruction in the upcoming runs of LHC where particles
will generate a huge amount of hits in the detector where it would be infeasible to reconstruct the particle tracks
using traditional techniques (combination of a Kalman filter and Runge–Kutta methods).”
.. and for many other amazing achievements
Deep Learning Scientific Publications
Source: ResearchGate and ScienceDirect database
For only 6 months
Our second reason is that since March 2018..
DLaaS in IBM Watson Studio has been launched
Read the full story on :
https://medium.com/ibm-watson/deep-learning-now-in-ibm-watson-studio-d8ba311e4ff1
https://medium.com/ibm-watson/introducing-ibm-watson-studio-e93638f0bb47
Source: NVIDIA Deep Learning training
Let’s start from the beginning
Traditional AI algorithms
Search
Breadth-first, A*, Greedy, Beam
Monte-Carlo Tree-search
Heuristics
Planning
PDDL, Planning Graph
Multi-agent planning
Still extensively used in optimizations, logistics, manufacturing,
robotics
https://www.youtube.com/watch?time_continue=32&v=vjSohj-Iclc
https://www.researchgate.net/publication/282477851_Optimization-
based_locomotion_planning_estimation_and_control_design_for_the_atlas_humanoi
d_robot?_sg=aJI6L_jzBn6V7qHUl8Xdt_kQ2GLcN-huoBfUD_jybf7SiPBTKvoSLfrS77CR-
iOmknpIJXlUiA
Traditional ML algorithms
 Least Squares Linear Regression
 Logistic Regression
 Support Vector Machines
 (Boosted) Decision Tree
 Random Forest
 Naive Bayes Classification
 PCA, ICA
 K-means and other clustering algorithms
…
Still extensively used in Data Science for various applications
SVM
K-means
Clustering
Random Forest
ML vs DL
Source: Stanford Deep Learning and NLP training
In traditional ML, Data scientist would design features
important for the problem and encode it by hand. The
machine is doing numerical optimization.
Deep Learning automatically learns the
representations directly from data by deep neural
networks, and on multiple levels (hierarchically).
Computer will transform the feature at one level into a
more abstract feature at a higher level.
Much higher content of intelligence than in traditional
ML algorithms.
AI developers can develop models with less knowledge
of data and with no manual feature engineering.
But DL was actually triggered by GPU and innovations in ML training
Brain Artificial Neural Network
Deep Neural Networks
 Multi-Layer Perceptrons - MLPs
 Convolutional Neural Networks – CNNs
 Recurrent Neural Networks – RNNs
 Memory-augmented (LSTM/ GRU, Attention) RNNs
 Tree-Recursive NNs
 Generative Adversarial Networks – GANs
 Deep Reinforcement Learning – RLs
Other, rarely used: Autoencoder, RBM, DBN, Liquid state machines..
We will focus in this presentation on CNNs and Memory-augmented RNNs, and discuss the derived
hybrid and complex architectures
Image is represented as matrix of numbers
Word Representations as Numerical Vectors
Input in RNN are numerical vector representations of words
Earlier, words were represented by one-hot vectors. Example :
motel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
hotel = [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
But motelT X hotel = 0, so the search won’t return any value
Nowadays, the words are expressed numerically in a way to
represent their context. These representations are trained with
frameworks like:
- Word2vec (Google), pre-trained on 100B words,
- Glove (Stanford), pre-trained on Wikipedia with 6B words
Word2vec will go through each word of the whole corpus and create
for each unique word a vector with predictions of the surrounding
words. Vector dimensionality is between 100 and 1000
Much simpler: import gensim library and load pre-trained word2vec
or fastText
Word2vec representation of word “good”
The CNN training process
Source: https://shafeentejani.github.io/2016-12-20/convolutional-neural-nets/
Training process:
1. Initialize filters and import the image
2. Apply convolution + ReLU + pooling
3. Chain several layers
4. Flatten the cubes (tensors)
5. Softmax classifier
6. Did we get the lion? Calculate the error
7. Backprop to tune the filters
8. Do this for all examples
9. Iterate to minimize the error
10. Test accuracy on separate data set
The process of convolution and pooling
Image source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
http://cs231n.github.io/convolutional-networks/
Convolutional Neural Networks
More complex features are captured in deeper layers
Sources: NVIDIA Deep Learning training,
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. “Convolutional deep belief networks for
scalable unsupervised learning of hierarchical representations.”
CNNs Use Cases – Cancer Diagnostic
Example of an architecture that takes as input 4 different scales of input images.
Source: DL for Magnification Independent Breast Cancer Image Classification
http://www.ee.oulu.fi/~nyalcinb/images/pub/icpr2016/presentation.pdf
CNNs Examples
Object Detection
Example: YOLO2 https://www.youtube.com/watch?v=VOC3huqHrss
 Uses one CNN for classification and localization
 Splits the input image in a grid of cells
 Output is tensor with predictions of box positions and object classes
Neural Style Transfer
Example: https://deepdreamgenerator.com/
 New image = Content image + Style image
 Generate new image means
 maximize similarity (content image, new image) and
 maximize similarity (style matrix, new image)
 Style matrix is the correlation across channels of the style image
+ =
Memory-Augmented Recurrent Neural Networks
Basic RNN Model :
 Can model unbounded text (sentences or entire documents)
 Inputs (time steps) xt-1,xt,xt+1 are words (vectors)
 U,V and W are shared matrices across layers which we train
 They reflect the number of features that we want to extract
 In most of the cases we care only of the last output o
RNN is augmented with mechanisms like LSTM, GRU or
Attention to memorize certain parts of the text
LSTM can add or remove information to the cell using gates:
Input gate (i) determines how much we will care about the current vector at all
Forget gate (f) says how much we should forget. If it is 1, we forget the past
Output gate (o) says how much a cell is going to be exposed vs to be kept longer time
GRU is simpler, but has similar performances:
Reset gate (r) – if it is 0, this will ignore the past and will store new word information
Update gate (z) – if it is 1, copy the state of the cell from the previous time step
Source: “Deep Learning” Article in Nature by Yan LeCun in 2015
LSTM and GRU
Simple RNN
Source: research paper arXiv:1412.3555, 2014
RNNs Use Cases - Neural Machine Translation
Source: University of Edinburgh
Statistical Machine Translation:
Learn the probabilistic model from texts
Use Bayes classifier
Use parallel training data from both languages
But need also to learn word alignments
Neural Machine Translation:
Novel techniques: Multilingual machine translation
( One-to-many, Many-to-one, Many-to-many )
Try IBM Watson NMT on :
https://console.bluemix.net/catalog/services/language-translator
with flag –H "X-Watson-Technology-Preview:2017-07-01"
RNNs Use Cases - Sentiment Analysis
Use of character-based multiplicative LSTM-RNN for sentiment analysis – “Sentiment Neuron”
Source: https://arxiv.org/pdf/1704.01444.pdf
Hybrid and complex models - NER
Source: Research paper in 2016 “Named Entity Recognition with Bidirectional LSTM-CNNs”
Example of a hybrid model for Named Entity Recognition:
- Input is tokenized text and word embeddings
- CNN extracts feature vector from characters
(useful for capturing pre- and suffixes)
- Features of each word fed to Bi-LSTM-RNN
- Trained and tested on CoNLL-2003 dataset
- The best performance when 50d Glove word embeddings + lexicon
were used
- Lexicon is based on DBpedia and is looked up during tag scoring
Hybrid and complex models – music generation
Music representation with one-hot encodings
Source: Music Generation by Deep Learning,
https://www.arxiv-vanity.com/papers/1712.04371/
Example of a hybrid model for Music Generation
- Note RNN is trained on a dataset of melodies encoded as one-hot vectors
- Reward RNN is used to generate the new music
- Q Network (CNN-type) will learn to choose the next action (musical note)
- Target Q Network (CNN-type) will learn to generate the target Q-value
(gain) for that note using Reinforcement Learning mechanism of reward
- Q and Target Q are decoupled to avoid gain overestimation
- Q-value (gain) function = similarity (new note, learned note) + similarity
(new note, user-defined rules)
An amazing implementation of a similar architecture
IBM Watson Beat
Watch and listen:
https://www.youtube.com/watch?v=Z5ymVzTUU6Y
And compose the AI-generated music yourself:
https://github.com/cognitive-catalyst/watson-beat
Hybrid and complex models – Blockchain AI
Source: Matrix.io ; https://www.matrix.io/html/MATRIXTechnicalWhitePaper.pdf
Two-phase approach for security hardening of Smart Contracts
1. Static verification based on CNN-extracted features
 These features will help identify syntax or structural patterns
 The system is train on manually labeled open-source smart contracts
2. Use of GANs for the dynamic security testing and enhancement:
 One RNN will update the smart contract (Generator network)
 One RNN will learn to generate hacks (Discriminator network)
 The updated smart contract will be deployed in a sandbox where it will be
exposed to the generated hacks
 The result will be taken by the Generator to improve the smart contract
 Repeat this training cycle until stable state is achieved
CNN + RNN for the automatic generation of smart contract code
1. CNN for the feature extractions
 Uses natural language text as input (legal agreements, SOW, LOI etc)
 CNN needs to be trained on a set of similar contracts
2. These extracted features are fed to RNN for the code generation
 Different code libraries will be deployed depending on the identified design pattern of
smart contracts
Hybrid and complex models – Lip Reading
Source: Research paper Lip Reading Sentences in the Wild, 2017 (https://arxiv.org/pdf/1611.05358.pdf)
“Watch” (CNN+LSTM RNN), “Listen” (LSTM-RNN), “Attend” and “Spell” (LSTM-RNN)
Demo: https://www.youtube.com/watch?v=5aogzAUPilE
State of the Art in DL
 In many cases, CNNs can compete with RNNs for downstream NLP tasks, but the performance depends on
the semantics of the problem. Some successful examples :
- Dynamic convolutional neural network (DCNN) for semantic modeling of sentences
- Sarcasm detection in Twitter texts based on a CNN network
- Use CNNs to calculate similarity between a question and entries in a knowledge base (KB) to
determine what supporting fact in the KB to look for when answering a question
 Sentiment Analysis is still best done with Tree-LSTM Recursive NNs or GRU-RNNs
 Use of Reinforcement Learning in combination with Deep Learning (example: train the chatbots)
 Use of GANs in NLP tasks like text generation, sentiment analysis and NMT. The Generator network is RNN,
but the Discriminator network can be either CNN or RNN.
How are these things implemented ?
 Programming language
 Python
 Others: C++, R, Matlab
 Programming frameworks
 Tensorflow
 Keras
 Others: Torch, PyTorch, Caffe, MXNet
 Big data
 Spark
 Dist-Keras
 Transfer learning and model reuse
 Github source code
 GPUs
Transfer Learning accelerates the development
GitHub code and pre-trained models
Research Papers
Repurpose and retrain
Cut off the last layer(s)
Business Use Cases
Banking and Insurance :
 Natural language generation - personalized communication with the customers
 Sentiment analysis
 Automatic extraction of information from contracts and legal documents
 Creditworthiness of online identity
 Fraud detection
DL can take into account thousands of data points, not 20-30 like with traditional ML
 Image recognition and automatic processing of insurance claims
 Automated investment
Example: http://www.alpaca.ai/alpacaalgo/ (create your own investment algorithm)
 Platform for fintech companies and innovations
 Predictive system for financial markets volatility (DL provides better accuracy)
 Pre-processing of unstructured big data sets
 Banks want to increase accuracy even very little, because this usually means a lot of money
Some consultants take their data together with the predictive model and in few weeks return back with the improved predictions based on DL
Blockchain :
 Smart Contract generation based on the text written using natural human language
 Vulnerability test of smart contracts
 Relational analysis of smart contracts to identify loopholes and malicious intents
IT services (Take request to service desk as input sequence and using LSTM-RNN produce the action),
Legal offices (information retrieval, text generations),
Augmentation of chatbots
Example: https://towardsdatascience.com/personality-for-your-chatbot-with-recurrent-neural-networks-2038f7f34636
What are the problems in DL adoption ?
 Development process requires many experiments
Ideal solution would be to have a deep neural network teaching other deep neural networks
Without that, one needs robust and efficient platform for the tests and development
 Experiments need to be run in parallel, which requires investment in HW and setup skills
For very complicated DNNs, 100-1000s of runs are needed, so one has to run 8 or more in parallel to tune the hyperparameters
 Reusing one existing model takes from 1 week to 1 month. But for hybrid and complex models, it can take 6
months to generate satisfactory results.
 Companies have heard about DL. The libraries and models exist, but the companies don’t know how to
start and how to implement them
The Solution
Watson Studio as DLaaS platform
+ DL consulting
+ Watson APIs
+ IBM Cloud private + IBM DSX Local + IBM COS Local + IBM FfDL +
IBM MAX + Apache Spark + IBM Spectrum Scale + IBM PowerAI + ……
For your exponential steps into the future
Demo
Please follow the tutorial that I published on:
https://medium.com/ibm-watson/exploring-deep-learning-and-
neural-network-modeler-with-watson-studio-ead35d21a438
For more insights into DL, contact me on
Sasha.Lazarevic@ch.ibm.com
Sasha Lazarevic, IBM Switzerland
https://www.linkedin.com/in/lzrvc/
LZRVC.com

More Related Content

What's hot

A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationXiaohu ZHU
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMustafa Aldemir
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep LearningMyungjin Lee
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceDeep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceBill Liu
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeekNightHyderabad
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40Jessica Willis
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningAdam Rogers
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
 

What's hot (20)

A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceDeep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 

Similar to Deep Learning and Watson Studio

Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Tomasz Sikora
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
Deep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWSDeep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWSKristana Kane
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningOswald Campesato
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenPoo Kuan Hoong
 

Similar to Deep Learning and Watson Studio (20)

Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Deep learning
Deep learningDeep learning
Deep learning
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
Deep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWSDeep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWS
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 

More from Sasha Lazarevic

Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AISasha Lazarevic
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckSasha Lazarevic
 
What is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantWhat is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantSasha Lazarevic
 
Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Sasha Lazarevic
 
Cognitive Urban Transport
Cognitive Urban TransportCognitive Urban Transport
Cognitive Urban TransportSasha Lazarevic
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataSasha Lazarevic
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Sasha Lazarevic
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsSasha Lazarevic
 

More from Sasha Lazarevic (11)

Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
What is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantWhat is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is Important
 
AI and Blockchain
AI and BlockchainAI and Blockchain
AI and Blockchain
 
Lean IT Transformation
Lean IT TransformationLean IT Transformation
Lean IT Transformation
 
Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011
 
Cognitive Urban Transport
Cognitive Urban TransportCognitive Urban Transport
Cognitive Urban Transport
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the Data
 
AI in HR -
AI in HR - AI in HR -
AI in HR -
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutions
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Deep Learning and Watson Studio

  • 1. Deep Learning and Watson Studio Sasha Lazarevic, IBM Switzerland https://www.linkedin.com/in/lzrvc/ LZRVC.com
  • 2. Two Reasons why we want to speak to you about Deep Learning
  • 3. It is used for planet discoveries Source: NASA.gov Traditional techniques like: - Robovetter, - Random Forests, - Probabilistic methods, - PCA. Since December 2017: - CNN
  • 4. .. for CERN experiments 2012 : Discovery of Higgs-Boson. Higgs boson identification involve the use of jet pull features, which is a class of features used to characterize the superstructure of a particle decay event. The scientists were using Fisher Discriminant Analysis (FDA) for classification, gradient boosting, random forests ML and many other methods. 2014: Challenge 2014 on Kaggle on tau-tau decays of Higgs Bosons. Winning solution (out of 1700 teams) was using an ensemble of 70 dropout neural networks, where each neural network had three hidden layers of 600 neurons: https://github.com/melisgl/higgsml 2016: Distributed Deep Learning: Cern created Dist-Keras and several new DDL optimization algorithms 2018: “Deep Learning could potentially help in data reconstruction in the upcoming runs of LHC where particles will generate a huge amount of hits in the detector where it would be infeasible to reconstruct the particle tracks using traditional techniques (combination of a Kalman filter and Runge–Kutta methods).”
  • 5. .. and for many other amazing achievements
  • 6. Deep Learning Scientific Publications Source: ResearchGate and ScienceDirect database For only 6 months
  • 7. Our second reason is that since March 2018.. DLaaS in IBM Watson Studio has been launched Read the full story on : https://medium.com/ibm-watson/deep-learning-now-in-ibm-watson-studio-d8ba311e4ff1 https://medium.com/ibm-watson/introducing-ibm-watson-studio-e93638f0bb47
  • 8. Source: NVIDIA Deep Learning training Let’s start from the beginning
  • 9. Traditional AI algorithms Search Breadth-first, A*, Greedy, Beam Monte-Carlo Tree-search Heuristics Planning PDDL, Planning Graph Multi-agent planning Still extensively used in optimizations, logistics, manufacturing, robotics https://www.youtube.com/watch?time_continue=32&v=vjSohj-Iclc https://www.researchgate.net/publication/282477851_Optimization- based_locomotion_planning_estimation_and_control_design_for_the_atlas_humanoi d_robot?_sg=aJI6L_jzBn6V7qHUl8Xdt_kQ2GLcN-huoBfUD_jybf7SiPBTKvoSLfrS77CR- iOmknpIJXlUiA
  • 10. Traditional ML algorithms  Least Squares Linear Regression  Logistic Regression  Support Vector Machines  (Boosted) Decision Tree  Random Forest  Naive Bayes Classification  PCA, ICA  K-means and other clustering algorithms … Still extensively used in Data Science for various applications SVM K-means Clustering Random Forest
  • 11. ML vs DL Source: Stanford Deep Learning and NLP training In traditional ML, Data scientist would design features important for the problem and encode it by hand. The machine is doing numerical optimization. Deep Learning automatically learns the representations directly from data by deep neural networks, and on multiple levels (hierarchically). Computer will transform the feature at one level into a more abstract feature at a higher level. Much higher content of intelligence than in traditional ML algorithms. AI developers can develop models with less knowledge of data and with no manual feature engineering.
  • 12. But DL was actually triggered by GPU and innovations in ML training
  • 14. Deep Neural Networks  Multi-Layer Perceptrons - MLPs  Convolutional Neural Networks – CNNs  Recurrent Neural Networks – RNNs  Memory-augmented (LSTM/ GRU, Attention) RNNs  Tree-Recursive NNs  Generative Adversarial Networks – GANs  Deep Reinforcement Learning – RLs Other, rarely used: Autoencoder, RBM, DBN, Liquid state machines.. We will focus in this presentation on CNNs and Memory-augmented RNNs, and discuss the derived hybrid and complex architectures
  • 15. Image is represented as matrix of numbers
  • 16. Word Representations as Numerical Vectors Input in RNN are numerical vector representations of words Earlier, words were represented by one-hot vectors. Example : motel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] hotel = [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] But motelT X hotel = 0, so the search won’t return any value Nowadays, the words are expressed numerically in a way to represent their context. These representations are trained with frameworks like: - Word2vec (Google), pre-trained on 100B words, - Glove (Stanford), pre-trained on Wikipedia with 6B words Word2vec will go through each word of the whole corpus and create for each unique word a vector with predictions of the surrounding words. Vector dimensionality is between 100 and 1000 Much simpler: import gensim library and load pre-trained word2vec or fastText Word2vec representation of word “good”
  • 17. The CNN training process Source: https://shafeentejani.github.io/2016-12-20/convolutional-neural-nets/ Training process: 1. Initialize filters and import the image 2. Apply convolution + ReLU + pooling 3. Chain several layers 4. Flatten the cubes (tensors) 5. Softmax classifier 6. Did we get the lion? Calculate the error 7. Backprop to tune the filters 8. Do this for all examples 9. Iterate to minimize the error 10. Test accuracy on separate data set
  • 18. The process of convolution and pooling Image source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution http://cs231n.github.io/convolutional-networks/
  • 19. Convolutional Neural Networks More complex features are captured in deeper layers Sources: NVIDIA Deep Learning training, H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations.”
  • 20. CNNs Use Cases – Cancer Diagnostic Example of an architecture that takes as input 4 different scales of input images. Source: DL for Magnification Independent Breast Cancer Image Classification http://www.ee.oulu.fi/~nyalcinb/images/pub/icpr2016/presentation.pdf
  • 21. CNNs Examples Object Detection Example: YOLO2 https://www.youtube.com/watch?v=VOC3huqHrss  Uses one CNN for classification and localization  Splits the input image in a grid of cells  Output is tensor with predictions of box positions and object classes Neural Style Transfer Example: https://deepdreamgenerator.com/  New image = Content image + Style image  Generate new image means  maximize similarity (content image, new image) and  maximize similarity (style matrix, new image)  Style matrix is the correlation across channels of the style image + =
  • 22. Memory-Augmented Recurrent Neural Networks Basic RNN Model :  Can model unbounded text (sentences or entire documents)  Inputs (time steps) xt-1,xt,xt+1 are words (vectors)  U,V and W are shared matrices across layers which we train  They reflect the number of features that we want to extract  In most of the cases we care only of the last output o RNN is augmented with mechanisms like LSTM, GRU or Attention to memorize certain parts of the text LSTM can add or remove information to the cell using gates: Input gate (i) determines how much we will care about the current vector at all Forget gate (f) says how much we should forget. If it is 1, we forget the past Output gate (o) says how much a cell is going to be exposed vs to be kept longer time GRU is simpler, but has similar performances: Reset gate (r) – if it is 0, this will ignore the past and will store new word information Update gate (z) – if it is 1, copy the state of the cell from the previous time step Source: “Deep Learning” Article in Nature by Yan LeCun in 2015 LSTM and GRU Simple RNN Source: research paper arXiv:1412.3555, 2014
  • 23. RNNs Use Cases - Neural Machine Translation Source: University of Edinburgh Statistical Machine Translation: Learn the probabilistic model from texts Use Bayes classifier Use parallel training data from both languages But need also to learn word alignments Neural Machine Translation: Novel techniques: Multilingual machine translation ( One-to-many, Many-to-one, Many-to-many ) Try IBM Watson NMT on : https://console.bluemix.net/catalog/services/language-translator with flag –H "X-Watson-Technology-Preview:2017-07-01"
  • 24. RNNs Use Cases - Sentiment Analysis Use of character-based multiplicative LSTM-RNN for sentiment analysis – “Sentiment Neuron” Source: https://arxiv.org/pdf/1704.01444.pdf
  • 25. Hybrid and complex models - NER Source: Research paper in 2016 “Named Entity Recognition with Bidirectional LSTM-CNNs” Example of a hybrid model for Named Entity Recognition: - Input is tokenized text and word embeddings - CNN extracts feature vector from characters (useful for capturing pre- and suffixes) - Features of each word fed to Bi-LSTM-RNN - Trained and tested on CoNLL-2003 dataset - The best performance when 50d Glove word embeddings + lexicon were used - Lexicon is based on DBpedia and is looked up during tag scoring
  • 26. Hybrid and complex models – music generation Music representation with one-hot encodings Source: Music Generation by Deep Learning, https://www.arxiv-vanity.com/papers/1712.04371/ Example of a hybrid model for Music Generation - Note RNN is trained on a dataset of melodies encoded as one-hot vectors - Reward RNN is used to generate the new music - Q Network (CNN-type) will learn to choose the next action (musical note) - Target Q Network (CNN-type) will learn to generate the target Q-value (gain) for that note using Reinforcement Learning mechanism of reward - Q and Target Q are decoupled to avoid gain overestimation - Q-value (gain) function = similarity (new note, learned note) + similarity (new note, user-defined rules) An amazing implementation of a similar architecture IBM Watson Beat Watch and listen: https://www.youtube.com/watch?v=Z5ymVzTUU6Y And compose the AI-generated music yourself: https://github.com/cognitive-catalyst/watson-beat
  • 27. Hybrid and complex models – Blockchain AI Source: Matrix.io ; https://www.matrix.io/html/MATRIXTechnicalWhitePaper.pdf Two-phase approach for security hardening of Smart Contracts 1. Static verification based on CNN-extracted features  These features will help identify syntax or structural patterns  The system is train on manually labeled open-source smart contracts 2. Use of GANs for the dynamic security testing and enhancement:  One RNN will update the smart contract (Generator network)  One RNN will learn to generate hacks (Discriminator network)  The updated smart contract will be deployed in a sandbox where it will be exposed to the generated hacks  The result will be taken by the Generator to improve the smart contract  Repeat this training cycle until stable state is achieved CNN + RNN for the automatic generation of smart contract code 1. CNN for the feature extractions  Uses natural language text as input (legal agreements, SOW, LOI etc)  CNN needs to be trained on a set of similar contracts 2. These extracted features are fed to RNN for the code generation  Different code libraries will be deployed depending on the identified design pattern of smart contracts
  • 28. Hybrid and complex models – Lip Reading Source: Research paper Lip Reading Sentences in the Wild, 2017 (https://arxiv.org/pdf/1611.05358.pdf) “Watch” (CNN+LSTM RNN), “Listen” (LSTM-RNN), “Attend” and “Spell” (LSTM-RNN) Demo: https://www.youtube.com/watch?v=5aogzAUPilE
  • 29. State of the Art in DL  In many cases, CNNs can compete with RNNs for downstream NLP tasks, but the performance depends on the semantics of the problem. Some successful examples : - Dynamic convolutional neural network (DCNN) for semantic modeling of sentences - Sarcasm detection in Twitter texts based on a CNN network - Use CNNs to calculate similarity between a question and entries in a knowledge base (KB) to determine what supporting fact in the KB to look for when answering a question  Sentiment Analysis is still best done with Tree-LSTM Recursive NNs or GRU-RNNs  Use of Reinforcement Learning in combination with Deep Learning (example: train the chatbots)  Use of GANs in NLP tasks like text generation, sentiment analysis and NMT. The Generator network is RNN, but the Discriminator network can be either CNN or RNN.
  • 30. How are these things implemented ?  Programming language  Python  Others: C++, R, Matlab  Programming frameworks  Tensorflow  Keras  Others: Torch, PyTorch, Caffe, MXNet  Big data  Spark  Dist-Keras  Transfer learning and model reuse  Github source code  GPUs
  • 31. Transfer Learning accelerates the development GitHub code and pre-trained models Research Papers Repurpose and retrain Cut off the last layer(s)
  • 32. Business Use Cases Banking and Insurance :  Natural language generation - personalized communication with the customers  Sentiment analysis  Automatic extraction of information from contracts and legal documents  Creditworthiness of online identity  Fraud detection DL can take into account thousands of data points, not 20-30 like with traditional ML  Image recognition and automatic processing of insurance claims  Automated investment Example: http://www.alpaca.ai/alpacaalgo/ (create your own investment algorithm)  Platform for fintech companies and innovations  Predictive system for financial markets volatility (DL provides better accuracy)  Pre-processing of unstructured big data sets  Banks want to increase accuracy even very little, because this usually means a lot of money Some consultants take their data together with the predictive model and in few weeks return back with the improved predictions based on DL Blockchain :  Smart Contract generation based on the text written using natural human language  Vulnerability test of smart contracts  Relational analysis of smart contracts to identify loopholes and malicious intents IT services (Take request to service desk as input sequence and using LSTM-RNN produce the action), Legal offices (information retrieval, text generations), Augmentation of chatbots Example: https://towardsdatascience.com/personality-for-your-chatbot-with-recurrent-neural-networks-2038f7f34636
  • 33. What are the problems in DL adoption ?  Development process requires many experiments Ideal solution would be to have a deep neural network teaching other deep neural networks Without that, one needs robust and efficient platform for the tests and development  Experiments need to be run in parallel, which requires investment in HW and setup skills For very complicated DNNs, 100-1000s of runs are needed, so one has to run 8 or more in parallel to tune the hyperparameters  Reusing one existing model takes from 1 week to 1 month. But for hybrid and complex models, it can take 6 months to generate satisfactory results.  Companies have heard about DL. The libraries and models exist, but the companies don’t know how to start and how to implement them
  • 34. The Solution Watson Studio as DLaaS platform + DL consulting + Watson APIs + IBM Cloud private + IBM DSX Local + IBM COS Local + IBM FfDL + IBM MAX + Apache Spark + IBM Spectrum Scale + IBM PowerAI + …… For your exponential steps into the future
  • 35. Demo Please follow the tutorial that I published on: https://medium.com/ibm-watson/exploring-deep-learning-and- neural-network-modeler-with-watson-studio-ead35d21a438
  • 36. For more insights into DL, contact me on Sasha.Lazarevic@ch.ibm.com Sasha Lazarevic, IBM Switzerland https://www.linkedin.com/in/lzrvc/ LZRVC.com

Editor's Notes

  1. LiveSlide Site https://www.youtube.com/watch?time_continue=2&v=VOC3huqHrss
  2. LiveSlide Site https://www.youtube.com/watch?time_continue=2&v=VOC3huqHrss