SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
Sequence Learning and
modern RNNs
Grigory Sapunov
source{d} tech talks Second Series| 2017
Moscow, June 03, 2017
gs@inten.to
Recap and a tiny intro into RNNs
Artificial neuron
Artificial neuron is a mathematical function inspired by a biological
neuron (rather far from the real biology).
Artificial Neural Network (ANN)
Artificial neuron is an elementary unit in an artificial neural network.
Feedforward NN vs. Recurrent NN
Recurrent neural networks (RNNs) allow cyclical connections.
Unfolding the RNN and training using BPTT
Can do backprop on the unfolded network: Backpropagation through time (BPTT)
http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf
Neural Network properties
Feedforward NN (FFNN):
● FFNN is a universal approximator: feed-forward network with a single hidden layer,
which contains finite number of hidden neurons, can approximate continuous functions
on compact subsets of Rn
, under mild assumptions on the activation function.
● Typical FFNNs have no inherent notion of order in time. They remember only training.
Recurrent NN (RNN):
● RNNs are Turing-complete: they can compute anything that can be computed and
have the capacity to simulate arbitrary procedures.
● RNNs possess a certain type of memory. They are much better suited to dealing with
sequences, context modeling and time dependencies.
RNN problem: Vanishing gradients
Solution: Long short-term memory (LSTM, Hochreiter, Schmidhuber, 1997)
LSTM cell
LSTM network
LSTM: Fixing vanishing gradient problem
Comparing LSTM and Simple RNN
More on LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Another solution: Gated Recurrent Unit (GRU)
GRU (Cho et al., 2014) is a bit simpler than LSTM (less weights)
Bidirectional RNN/LSTM
There are many situations when you
see the whole sequence at once
(OCR, speech recognition,
translation, caption generation, …).
So you can scan the [1D] sequence
in both directions, forward and
backward.
Here comes BRNN/BLSTM (Graves,
Schmidhuber, 2005).
Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
Example: BLSTM classifying the utterance “one oh five”
Multidirectional Multidimensional RNN/LSTM
Standard RNNs are inherently one dimensional, and therefore poorly suited to
multidimensional data (e.g. images).
The basic idea of MDRNNs (Graves, Fernandez, Schmidhuber, 2007) is to replace
the single recurrent connection found in standard RNNs with as many recurrent
connections as there are dimensions in the data.
It assumes some ordering on the multidimensional data. BRNNs can be extended to
n-dimensional data by using 2^n separate hidden layers.
Multi-dimensionality (MDRNN)
The basic idea of MDRNNs is to replace the single recurrent connection found
in standard RNNs with as many recurrent connections as there are dimensions
in the data.
Multi-directionality (MDMDRNN?)
Multidirectional multidimensional RNN (MDMDRNN?)
The previously mentioned ordering is not the only possible one. It might be OK for
some tasks, but it is usually preferable for the network to have access to the
surrounding context in all directions. This is particularly true for tasks where precise
localisation is required, such as image segmentation.
For one dimensional RNNs, the problem of multidirectional context was solved by
the introduction of bidirectional recurrent neural networks (BRNNs). BRNNs contain
two separate hidden layers that process the input sequence in the forward and
reverse directions.
BRNNs can be extended to n-dimensional data by using 2n
separate hidden layers,
each of which processes the sequence using the ordering defined above, but with a
different choice of axes.
ReNet (2015)
PyraMiD-LSTM (2015)
http://arxiv.org/abs/1505.00393
http://arxiv.org/abs/1506.07452
Tree-LSTM (2015)
Interesting LSTM generalisation: Tree-LSTM
“However, natural language exhibits syntactic
properties that would naturally combine
words to phrases. We introduce the
Tree-LSTM, a generalization of LSTMs to
tree-structured network topologies.
Tree-LSTMs outperform all existing systems
and strong LSTM baselines on two tasks:
predicting the semantic relatedness of two
sentences and sentiment classification.”
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,
https://arxiv.org/abs/1507.01526
Grid LSTM (2016)
Another interesting LSTM generalisation: Grid LSTM
“This paper introduces Grid Long Short-Term Memory, a network of LSTM cells
arranged in a multidimensional grid that can be applied to vectors, sequences
or higher dimensional data such as images. The network differs from existing
deep LSTM architectures in that the cells are connected between network layers
as well as along the spatiotemporal dimensions of the data. The network provides
a unified way of using LSTM for both deep and sequential computation.”
Grid LSTM (2016)
https://arxiv.org/abs/1507.01526
Grid LSTM (2016)
One-dimensional Grid LSTM corresponds to a feed-forward network that uses
LSTM cells in place of transfer functions such as tanh and ReLU. These networks
are related to Highway Networks (Srivastava et al., 2015) where a gated transfer
function is used to successfully train feed-forward networks with up to 900 layers
of depth.
Grid LSTM with two dimensions is analogous to the Stacked LSTM, but it adds
cells along the depth dimension too.
Grid LSTM with three or more dimensions is analogous to Multidimensional
LSTM, but differs from it not just by having the cells along the depth dimension,
but also by using the proposed mechanism for modulating the N-way interaction
that is not prone to the instability present in Multidimesional LSTM.
Grid LSTM (2016)
End of Intro
Further we will not make a distinction between RNN/GRU/LSTM, and will usually be
using the word RNN for any kind of internal block. Typically most RNNs now are
actually LSTMs.
Representation Learning
Encoding semantics
Using word2vec instead of word indexes allows you to better deal with the word
meanings (e.g. no need to enumerate all synonyms because their vectors are
already close to each other).
But the naive way to work with word2vec vectors still gives you a “bag of words”
model, where phrases “The man killed the tiger” and “The tiger killed the man” are
equal.
Need models which pay attention to the word ordering: paragraph2vec, sentence
embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
https://code.google.com/archive/p/word2vec/
Example: Semantic Spaces (word2vec, GloVe)
vector('king') - vector('man') + vector('woman') = vector('queen')
http://nlp.stanford.edu/projects/glove/
Example: Semantic Spaces (word2vec, GloVe)
Case: Sentiment analysis
https://blog.openai.com/unsupervised-sentiment-neuron/
“Our research implies that simply training large unsupervised next-step-prediction
models on large amounts of data may be a good approach to use when creating
systems with good representation learning capabilities.”
Multi-modal Learning
Deep Learning models become multi-modal: they use 2+ modalities
simultaneously, i.e.:
● Image caption generation: images + text
● Search Web by an image: images + text
● Video describing: the same but added time dimension
● Visual question answering: images + text
● Speech recognition: audio + video (lips motion)
● Image classification and navigation: RGB-D (color + depth)
Where does it aim to?
● Common metric space for each concept, “thought vector”. Will be possible to
match different modalities easily.
Multi-modal Learning
http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Example: More multi-modal learning
Example: Text generation by image
http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
Example: Image generation by text
StackGAN: Text to Photo-realistic Image Synthesis with
Stacked Generative Adversarial Networks, https://arxiv.org/abs/1612.03242
Example: Code generation by image
pix2code: Generating Code from a Graphical User Interface Screenshot,
https://arxiv.org/abs/1705.07962
Sequence Learning / Theory
Sequence to Sequence Learning (seq2seq)
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Another useful thing: CTC Output Layer
CTC (Connectionist Temporal Classification; Graves, Fernández, Gomez,
Schmidhuber, 2006) was specifically designed for temporal classification tasks; that
is, for sequence labelling problems where the alignment between the inputs and the
target labels is unknown.
CTC models all aspects of the sequence with a single neural network, and does not
require the network to be combined with a hidden Markov model. It also does not
require presegmented training data, or external post-processing to extract the
label sequence from the network outputs.
The CTC network predicts only the sequence of phonemes (typically as a series
of spikes, separated by ‘blanks’, or null predictions), while the framewise network
attempts to align them with the manual segmentation.
Example: CTC vs. Framewise classification
CTC (Connectionist Temporal Classification)
https://github.com/baidu-research/warp-ctc
Encoder-Decoder architecture
https://github.com/farizrahman4u/seq2seq
Encoder-Decoder with Attention
https://research.googleblog.com/2016/09/a-neural-network-for-machine.html
CNN+RNN with Attention
http://kelvinxu.github.io/projects/capgen.html
CNN+RNN with Attention
http://kelvinxu.github.io/projects/capgen.html
More augmented RNNs
● Attentional Interfaces (Hard attention, Soft
attention)
● Differentiable Memory (Neural Turing
Machines, Differentiable neural computer,
Hierarchical Attentive Memory, Memory
Networks, ...)
● Adaptive Computation Time
● Differentiable Data Structures (structured
memory: stack, list, queue, …)
● Differential Programming (Neural
Programmer, Differentiable Functional
Program Interpreters, …)
● ...
https://deepmind.com/blog/differentiable-neural-computers/
Sequence Learning / Practice
Encoder-Decoder: original architecture
Sequence to Sequence Learning with Neural Networks, https://arxiv.org/abs/1409.3215
Recurrent Encoder / Recurrent Decoder
Case: Machine Translation
Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
Encoder-Decoder: modern architecture
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,
https://arxiv.org/abs/1609.08144
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
https://arxiv.org/abs/1611.04558
Encoder-Decoder: character-level models
Fully Character-Level Neural Machine Translation without Explicit Segmentation,
https://arxiv.org/abs/1610.03017
The Problem:
RNNs are slow.
The solution #1: CNN encoder
A Convolutional Encoder Model for Neural Machine Translation, https://arxiv.org/abs/1611.02344
Convolutional Encoder / Recurrent Decoder
The solution #1.5: CNN encoder + decoder
Convolutional Sequence to Sequence Learning, https://arxiv.org/abs/1705.03122
Actually no RNN here (Facebook AI Research loves CNNs).
The solution #2: Optimizing RNNs
Exploring Sparsity in Recurrent Neural Networks, https://arxiv.org/abs/1704.05119
“Pruning RNNs reduces the size of the model and can also help achieve significant
inference time speed-up using sparse matrix multiply. Benchmarks show that using
our technique model size can be reduced by 90% and speed-up is around 2× to
7×.”
The solution #3: Better hardware
● Google TPU gen.2
○ 180 TFLOPS?
● NVIDIA DGX-1 (8*P100) ($129,000)
○ 170 TFLOPS (FP16)
○ 85 TFLOPS (FP32)
● NVIDIA Tesla V100
○ 15 TFLOPS (FP32)
○ 120 TFLOPS (Tensor Core)
● NVIDIA Tesla P100
○ 10.6 TFLOPS (FP32)
● NVIDIA GTX Titan X ($1000)
○ 11 TFLOPS (FP32)
● NVIDIA GTX 1080/1080 Ti ($700)
○ 8/11.3 TFLOPS (FP32)
The solution #3: Better hardware
Why this solution could be among the most interesting ones?
Current success of NNs (especially CNNs) is backed by a large amounts of data
available _AND_ more powerful hardware (using the decades-old algorithms). We
potentially could achieve the same performance in the past, but the learning process
was just too slow (and we were too impatient).
The processor performance grows exponentially and in 5-10 years the available
computing power can increase 1000x. There may appear computing units more
suitable for RNN computations as well.
The situation could repeat. When the hardware will allow fast training of RNNs, we
could achieve a new kind of results. Remember, RNNs are Turing complete. They
are (potentially) much more powerful than feed-forward NNs.
https://ru.linkedin.com/in/grigorysapunov
gs@inten.to
Thanks!

Mais conteúdo relacionado

Mais procurados

Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC techniqueChun Hao Wang
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelANIRUDHMALODE2
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM健程 杨
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Deep Learning libraries and first experiments with Theano
Deep Learning libraries and first experiments with TheanoDeep Learning libraries and first experiments with Theano
Deep Learning libraries and first experiments with TheanoVincenzo Lomonaco
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applicationsSungjoon Choi
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabImry Kissos
 
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 

Mais procurados (20)

Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Deep Learning libraries and first experiments with Theano
Deep Learning libraries and first experiments with TheanoDeep Learning libraries and first experiments with Theano
Deep Learning libraries and first experiments with Theano
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and Matlab
 
The impact of visual saliency prediction in image classification
The impact of visual saliency prediction in image classificationThe impact of visual saliency prediction in image classification
The impact of visual saliency prediction in image classification
 
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
 

Semelhante a Sequence learning and modern RNNs

Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...csandit
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesYannis Flet-Berliac
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniquesvivatechijri
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networksFares Hasan
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachIJECEIAES
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 
Prediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNPrediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNIJECEIAES
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET Journal
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learningMohamed Essam
 
deeplearning
deeplearningdeeplearning
deeplearninghuda2018
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)Abhilash Majumder
 

Semelhante a Sequence learning and modern RNNs (20)

Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networks
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
Prediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNPrediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNN
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
Deepwalk vs Node2vec
Deepwalk vs Node2vecDeepwalk vs Node2vec
Deepwalk vs Node2vec
 
rnn_review.10.pdf
rnn_review.10.pdfrnn_review.10.pdf
rnn_review.10.pdf
 
Literature Review
Literature ReviewLiterature Review
Literature Review
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learning
 
deeplearning
deeplearningdeeplearning
deeplearning
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)
 

Mais de Grigory Sapunov

AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021Grigory Sapunov
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)Grigory Sapunov
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Grigory Sapunov
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionGrigory Sapunov
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)Grigory Sapunov
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Grigory Sapunov
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Grigory Sapunov
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep LearningGrigory Sapunov
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучениеGrigory Sapunov
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureGrigory Sapunov
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Computer Vision and Deep Learning
Computer Vision and Deep LearningComputer Vision and Deep Learning
Computer Vision and Deep LearningGrigory Sapunov
 
Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...
Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...
Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...Grigory Sapunov
 

Mais de Grigory Sapunov (20)

Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
NLP in 2020
NLP in 2020NLP in 2020
NLP in 2020
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]
 
BERTology meets Biology
BERTology meets BiologyBERTology meets Biology
BERTology meets Biology
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 version
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep Learning
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучение
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and Future
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Computer Vision and Deep Learning
Computer Vision and Deep LearningComputer Vision and Deep Learning
Computer Vision and Deep Learning
 
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
 
Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...
Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...
Международная научно-практическая конференция учителей / Яндекс, МФТИ / 05.12...
 

Último

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Sequence learning and modern RNNs

  • 1. Sequence Learning and modern RNNs Grigory Sapunov source{d} tech talks Second Series| 2017 Moscow, June 03, 2017 gs@inten.to
  • 2. Recap and a tiny intro into RNNs
  • 3. Artificial neuron Artificial neuron is a mathematical function inspired by a biological neuron (rather far from the real biology).
  • 4. Artificial Neural Network (ANN) Artificial neuron is an elementary unit in an artificial neural network.
  • 5. Feedforward NN vs. Recurrent NN Recurrent neural networks (RNNs) allow cyclical connections.
  • 6. Unfolding the RNN and training using BPTT Can do backprop on the unfolded network: Backpropagation through time (BPTT) http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf
  • 7. Neural Network properties Feedforward NN (FFNN): ● FFNN is a universal approximator: feed-forward network with a single hidden layer, which contains finite number of hidden neurons, can approximate continuous functions on compact subsets of Rn , under mild assumptions on the activation function. ● Typical FFNNs have no inherent notion of order in time. They remember only training. Recurrent NN (RNN): ● RNNs are Turing-complete: they can compute anything that can be computed and have the capacity to simulate arbitrary procedures. ● RNNs possess a certain type of memory. They are much better suited to dealing with sequences, context modeling and time dependencies.
  • 8. RNN problem: Vanishing gradients Solution: Long short-term memory (LSTM, Hochreiter, Schmidhuber, 1997)
  • 11. LSTM: Fixing vanishing gradient problem
  • 12. Comparing LSTM and Simple RNN More on LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 13. Another solution: Gated Recurrent Unit (GRU) GRU (Cho et al., 2014) is a bit simpler than LSTM (less weights)
  • 14. Bidirectional RNN/LSTM There are many situations when you see the whole sequence at once (OCR, speech recognition, translation, caption generation, …). So you can scan the [1D] sequence in both directions, forward and backward. Here comes BRNN/BLSTM (Graves, Schmidhuber, 2005).
  • 15. Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
  • 16. Example: BLSTM classifying the utterance “one oh five”
  • 17. Multidirectional Multidimensional RNN/LSTM Standard RNNs are inherently one dimensional, and therefore poorly suited to multidimensional data (e.g. images). The basic idea of MDRNNs (Graves, Fernandez, Schmidhuber, 2007) is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data. It assumes some ordering on the multidimensional data. BRNNs can be extended to n-dimensional data by using 2^n separate hidden layers.
  • 18. Multi-dimensionality (MDRNN) The basic idea of MDRNNs is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data.
  • 20. Multidirectional multidimensional RNN (MDMDRNN?) The previously mentioned ordering is not the only possible one. It might be OK for some tasks, but it is usually preferable for the network to have access to the surrounding context in all directions. This is particularly true for tasks where precise localisation is required, such as image segmentation. For one dimensional RNNs, the problem of multidirectional context was solved by the introduction of bidirectional recurrent neural networks (BRNNs). BRNNs contain two separate hidden layers that process the input sequence in the forward and reverse directions. BRNNs can be extended to n-dimensional data by using 2n separate hidden layers, each of which processes the sequence using the ordering defined above, but with a different choice of axes.
  • 22. Tree-LSTM (2015) Interesting LSTM generalisation: Tree-LSTM “However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification.” Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks, https://arxiv.org/abs/1507.01526
  • 23. Grid LSTM (2016) Another interesting LSTM generalisation: Grid LSTM “This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images. The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data. The network provides a unified way of using LSTM for both deep and sequential computation.”
  • 25. Grid LSTM (2016) One-dimensional Grid LSTM corresponds to a feed-forward network that uses LSTM cells in place of transfer functions such as tanh and ReLU. These networks are related to Highway Networks (Srivastava et al., 2015) where a gated transfer function is used to successfully train feed-forward networks with up to 900 layers of depth. Grid LSTM with two dimensions is analogous to the Stacked LSTM, but it adds cells along the depth dimension too. Grid LSTM with three or more dimensions is analogous to Multidimensional LSTM, but differs from it not just by having the cells along the depth dimension, but also by using the proposed mechanism for modulating the N-way interaction that is not prone to the instability present in Multidimesional LSTM.
  • 27. End of Intro Further we will not make a distinction between RNN/GRU/LSTM, and will usually be using the word RNN for any kind of internal block. Typically most RNNs now are actually LSTMs.
  • 29. Encoding semantics Using word2vec instead of word indexes allows you to better deal with the word meanings (e.g. no need to enumerate all synonyms because their vectors are already close to each other). But the naive way to work with word2vec vectors still gives you a “bag of words” model, where phrases “The man killed the tiger” and “The tiger killed the man” are equal. Need models which pay attention to the word ordering: paragraph2vec, sentence embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
  • 30. https://code.google.com/archive/p/word2vec/ Example: Semantic Spaces (word2vec, GloVe) vector('king') - vector('man') + vector('woman') = vector('queen')
  • 32. Case: Sentiment analysis https://blog.openai.com/unsupervised-sentiment-neuron/ “Our research implies that simply training large unsupervised next-step-prediction models on large amounts of data may be a good approach to use when creating systems with good representation learning capabilities.”
  • 33.
  • 34. Multi-modal Learning Deep Learning models become multi-modal: they use 2+ modalities simultaneously, i.e.: ● Image caption generation: images + text ● Search Web by an image: images + text ● Video describing: the same but added time dimension ● Visual question answering: images + text ● Speech recognition: audio + video (lips motion) ● Image classification and navigation: RGB-D (color + depth) Where does it aim to? ● Common metric space for each concept, “thought vector”. Will be possible to match different modalities easily.
  • 35. Multi-modal Learning http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
  • 37.
  • 38. Example: Text generation by image http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
  • 39. Example: Image generation by text StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, https://arxiv.org/abs/1612.03242
  • 40. Example: Code generation by image pix2code: Generating Code from a Graphical User Interface Screenshot, https://arxiv.org/abs/1705.07962
  • 42. Sequence to Sequence Learning (seq2seq) http://karpathy.github.io/2015/05/21/rnn-effectiveness/
  • 43. Another useful thing: CTC Output Layer CTC (Connectionist Temporal Classification; Graves, Fernández, Gomez, Schmidhuber, 2006) was specifically designed for temporal classification tasks; that is, for sequence labelling problems where the alignment between the inputs and the target labels is unknown. CTC models all aspects of the sequence with a single neural network, and does not require the network to be combined with a hidden Markov model. It also does not require presegmented training data, or external post-processing to extract the label sequence from the network outputs. The CTC network predicts only the sequence of phonemes (typically as a series of spikes, separated by ‘blanks’, or null predictions), while the framewise network attempts to align them with the manual segmentation.
  • 44. Example: CTC vs. Framewise classification
  • 45. CTC (Connectionist Temporal Classification) https://github.com/baidu-research/warp-ctc
  • 50. More augmented RNNs ● Attentional Interfaces (Hard attention, Soft attention) ● Differentiable Memory (Neural Turing Machines, Differentiable neural computer, Hierarchical Attentive Memory, Memory Networks, ...) ● Adaptive Computation Time ● Differentiable Data Structures (structured memory: stack, list, queue, …) ● Differential Programming (Neural Programmer, Differentiable Functional Program Interpreters, …) ● ...
  • 53. Encoder-Decoder: original architecture Sequence to Sequence Learning with Neural Networks, https://arxiv.org/abs/1409.3215 Recurrent Encoder / Recurrent Decoder
  • 54. Case: Machine Translation Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
  • 55. Encoder-Decoder: modern architecture Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, https://arxiv.org/abs/1609.08144 Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation https://arxiv.org/abs/1611.04558
  • 56. Encoder-Decoder: character-level models Fully Character-Level Neural Machine Translation without Explicit Segmentation, https://arxiv.org/abs/1610.03017
  • 58. The solution #1: CNN encoder A Convolutional Encoder Model for Neural Machine Translation, https://arxiv.org/abs/1611.02344 Convolutional Encoder / Recurrent Decoder
  • 59. The solution #1.5: CNN encoder + decoder Convolutional Sequence to Sequence Learning, https://arxiv.org/abs/1705.03122 Actually no RNN here (Facebook AI Research loves CNNs).
  • 60. The solution #2: Optimizing RNNs Exploring Sparsity in Recurrent Neural Networks, https://arxiv.org/abs/1704.05119 “Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2× to 7×.”
  • 61. The solution #3: Better hardware ● Google TPU gen.2 ○ 180 TFLOPS? ● NVIDIA DGX-1 (8*P100) ($129,000) ○ 170 TFLOPS (FP16) ○ 85 TFLOPS (FP32) ● NVIDIA Tesla V100 ○ 15 TFLOPS (FP32) ○ 120 TFLOPS (Tensor Core) ● NVIDIA Tesla P100 ○ 10.6 TFLOPS (FP32) ● NVIDIA GTX Titan X ($1000) ○ 11 TFLOPS (FP32) ● NVIDIA GTX 1080/1080 Ti ($700) ○ 8/11.3 TFLOPS (FP32)
  • 62. The solution #3: Better hardware Why this solution could be among the most interesting ones? Current success of NNs (especially CNNs) is backed by a large amounts of data available _AND_ more powerful hardware (using the decades-old algorithms). We potentially could achieve the same performance in the past, but the learning process was just too slow (and we were too impatient). The processor performance grows exponentially and in 5-10 years the available computing power can increase 1000x. There may appear computing units more suitable for RNN computations as well. The situation could repeat. When the hardware will allow fast training of RNNs, we could achieve a new kind of results. Remember, RNNs are Turing complete. They are (potentially) much more powerful than feed-forward NNs.