SlideShare a Scribd company logo
1 of 12
Character(Byte)-
Level Language
Model
Sequence-to-Sequence Language
Model for Transfer Learning
Firas Obeid
Agenda
• Why Unstructured Data?
• Character2Vector using Byte Level Encoding
• Embeddings
• Tips for LSTM Layer
• Transfer Learning
• Show code implementation with the model
Why the hype on Unstructured Data?
• Natural language processing (NLP) has become mainstream through a focus
on creating value from unstructured data.
• The number of firms that only use unstructured data has shot up from 2% in
2018 to 17% in 2020, and only 3% of the firms surveyed report that they do
not use alternative data sources, down from 30% in 2018.
• Last week, Refinitivs’ 2020 survey shows 72% of firms’ models were
negatively impacted by COVID-19. Some 12% of firms declared their models
obsolete, and 15% are building new ones. The main problem was the lack of
agility to quickly adapt and include new data sets in models as
circumstances changed.
Benefits of Charac2vec
• Having the character embedding, every single word’s vector can be formed even if it is out-
of-vocabulary words (no Bag-of Words necessary). On the other hand, word embedding can
only handle those seen words.
• Good fits for misspelling words
• Handles infrequent words better than word2vec embedding as later one suffers from lack
of enough training opportunity for those rare words
• Reduces model complexity and improving the performance (in terms of speed)
• All this comes at a cost of training on larger sparse sequences thus longer time to train and
optimize!
Why not Byte Level
Even?
• When ASCII encoding is used, there is no difference
between reading characters or bytes. The ASCII-
way of encoding characters allows for 256
characters to be encoded and (surprise…) these
256 possible characters are stored as bytes. 256 for
8 bit slots.
• I will use only 127 of these possible character that
are common to the English language. (7 bit slots)
• 0-31, 127  Control Characters (Nonprintable)
• 32-126  Alphabets(Upper/Lower case), Numeric,
symbols and signs
Embeddings vs One-
hot Encoding
Binary mode returns an array denoting which tokens exist at least once
in the input, while int mode replaces each token by an integer, thus
preserving their order
Embeddings
Layer
• Gives relationship between characters. Based on how
characters accompany each other.
• Dense vector representation (n-Dimensional) of float point
values. Map(char/byte) to a dense vector.
• Embeddings are trainable weights/parameters by the model
equivalent to weights learned by dense layer.
• In our case each unique character/byte is represented with an
N-Dimensional vector of floating point values, where the
learned embedding forms a lookup table by "looking up" each
characters dense vector in the table to encode it.
• A simple integer encoding of our characters is not efficient for
the model to interpret since a linear classifier only learns the
weights for a single feature but not the relationship (probability
distribution) between each feature(characters) or there
encodings.
• A higher dimensional embedding can capture “fine-grained”
relationships between characters, but takes more data to
learn.(256-Dimensions our case)
Tips for LSTM Inputs
• The LSTM input layer must be 3D. [i.e batch_input_shape=(batch_size, n_timesteps, n_features)]
• The meaning of the 3 input dimensions are: samples, time steps, and features (sequences,
sequence_length, characters).
• The LSTM input layer is defined by the input_shape argument on the first hidden layer.
• The input_shape argument takes a tuple of two values that define the number of time steps and
features.
• The number of samples is assumed to be 1 or more. Specify to None for batch_input_shape otherwise or
None for the input_length.
• The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D.
• The reshape() function takes a tuple as an argument that defines the new shape
• The LSTM return the entire sequence of outputs for each sample (one vector per timestep per sample), if
you set return_sequences=True.
• Stateful RNN only makes sense if each input sequence in a batch starts exactly where the corresponding
sequence in the previous batch left off. Our RNN model is stateless since each sample is different from
the other and they dont form a text corpus but are separate headlines.
Semi-Supervised
(Transfer Learning)
• Previously, Word2Vec take two embedding layers
for each token to predict probability of a word
before and after. LSTM can handle that without
random sampling, just take the max logit or
probability of the outcome to predict next word
or character
• Unlabeled data can compensate for labeled
fewer data in asset pricing .
Tips for Choosing
Learning Rate

More Related Content

What's hot

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 

What's hot (20)

Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
simple problem to convert NFA with epsilon to without epsilon
simple problem to convert NFA with epsilon to without epsilonsimple problem to convert NFA with epsilon to without epsilon
simple problem to convert NFA with epsilon to without epsilon
 
Genetic algorithms in Data Mining
Genetic algorithms in Data MiningGenetic algorithms in Data Mining
Genetic algorithms in Data Mining
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 

Similar to Language Model.pptx

constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in C
Sahithi Naraparaju
 
CONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN CCONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN C
Sahithi Naraparaju
 
Error Detection N Correction
Error Detection N CorrectionError Detection N Correction
Error Detection N Correction
Ankan Adhikari
 
FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2
rohassanie
 

Similar to Language Model.pptx (20)

Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in C
 
CONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN CCONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN C
 
C6 agramakrishnan1
C6 agramakrishnan1C6 agramakrishnan1
C6 agramakrishnan1
 
Error Detection N Correction
Error Detection N CorrectionError Detection N Correction
Error Detection N Correction
 
CS4443 - Modern Programming Language - I Lecture (2)
CS4443 - Modern Programming Language - I  Lecture (2)CS4443 - Modern Programming Language - I  Lecture (2)
CS4443 - Modern Programming Language - I Lecture (2)
 
Engineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptxEngineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptx
 
UNIT-II VISUAL BASIC.NET | BCA
UNIT-II VISUAL BASIC.NET | BCAUNIT-II VISUAL BASIC.NET | BCA
UNIT-II VISUAL BASIC.NET | BCA
 
C#
C#C#
C#
 
Data types.pdf
Data types.pdfData types.pdf
Data types.pdf
 
Numeric Data Types & Strings
Numeric Data Types & StringsNumeric Data Types & Strings
Numeric Data Types & Strings
 
Constants Variables Datatypes by Mrs. Sowmya Jyothi
Constants Variables Datatypes by Mrs. Sowmya JyothiConstants Variables Datatypes by Mrs. Sowmya Jyothi
Constants Variables Datatypes by Mrs. Sowmya Jyothi
 
FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
VB.NET Datatypes.pptx
VB.NET Datatypes.pptxVB.NET Datatypes.pptx
VB.NET Datatypes.pptx
 
DDUV.pdf
DDUV.pdfDDUV.pdf
DDUV.pdf
 
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
 
Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1
 
Data Type in C Programming
Data Type in C ProgrammingData Type in C Programming
Data Type in C Programming
 
Keywords, identifiers and data type of vb.net
Keywords, identifiers and data type of vb.netKeywords, identifiers and data type of vb.net
Keywords, identifiers and data type of vb.net
 

Recently uploaded

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 

Recently uploaded (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Language Model.pptx

  • 2. Agenda • Why Unstructured Data? • Character2Vector using Byte Level Encoding • Embeddings • Tips for LSTM Layer • Transfer Learning • Show code implementation with the model
  • 3. Why the hype on Unstructured Data? • Natural language processing (NLP) has become mainstream through a focus on creating value from unstructured data. • The number of firms that only use unstructured data has shot up from 2% in 2018 to 17% in 2020, and only 3% of the firms surveyed report that they do not use alternative data sources, down from 30% in 2018. • Last week, Refinitivs’ 2020 survey shows 72% of firms’ models were negatively impacted by COVID-19. Some 12% of firms declared their models obsolete, and 15% are building new ones. The main problem was the lack of agility to quickly adapt and include new data sets in models as circumstances changed.
  • 4. Benefits of Charac2vec • Having the character embedding, every single word’s vector can be formed even if it is out- of-vocabulary words (no Bag-of Words necessary). On the other hand, word embedding can only handle those seen words. • Good fits for misspelling words • Handles infrequent words better than word2vec embedding as later one suffers from lack of enough training opportunity for those rare words • Reduces model complexity and improving the performance (in terms of speed) • All this comes at a cost of training on larger sparse sequences thus longer time to train and optimize!
  • 5. Why not Byte Level Even? • When ASCII encoding is used, there is no difference between reading characters or bytes. The ASCII- way of encoding characters allows for 256 characters to be encoded and (surprise…) these 256 possible characters are stored as bytes. 256 for 8 bit slots. • I will use only 127 of these possible character that are common to the English language. (7 bit slots) • 0-31, 127  Control Characters (Nonprintable) • 32-126  Alphabets(Upper/Lower case), Numeric, symbols and signs
  • 6. Embeddings vs One- hot Encoding Binary mode returns an array denoting which tokens exist at least once in the input, while int mode replaces each token by an integer, thus preserving their order
  • 7. Embeddings Layer • Gives relationship between characters. Based on how characters accompany each other. • Dense vector representation (n-Dimensional) of float point values. Map(char/byte) to a dense vector. • Embeddings are trainable weights/parameters by the model equivalent to weights learned by dense layer. • In our case each unique character/byte is represented with an N-Dimensional vector of floating point values, where the learned embedding forms a lookup table by "looking up" each characters dense vector in the table to encode it. • A simple integer encoding of our characters is not efficient for the model to interpret since a linear classifier only learns the weights for a single feature but not the relationship (probability distribution) between each feature(characters) or there encodings. • A higher dimensional embedding can capture “fine-grained” relationships between characters, but takes more data to learn.(256-Dimensions our case)
  • 8. Tips for LSTM Inputs • The LSTM input layer must be 3D. [i.e batch_input_shape=(batch_size, n_timesteps, n_features)] • The meaning of the 3 input dimensions are: samples, time steps, and features (sequences, sequence_length, characters). • The LSTM input layer is defined by the input_shape argument on the first hidden layer. • The input_shape argument takes a tuple of two values that define the number of time steps and features. • The number of samples is assumed to be 1 or more. Specify to None for batch_input_shape otherwise or None for the input_length. • The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D. • The reshape() function takes a tuple as an argument that defines the new shape • The LSTM return the entire sequence of outputs for each sample (one vector per timestep per sample), if you set return_sequences=True. • Stateful RNN only makes sense if each input sequence in a batch starts exactly where the corresponding sequence in the previous batch left off. Our RNN model is stateless since each sample is different from the other and they dont form a text corpus but are separate headlines.
  • 9.
  • 10. Semi-Supervised (Transfer Learning) • Previously, Word2Vec take two embedding layers for each token to predict probability of a word before and after. LSTM can handle that without random sampling, just take the max logit or probability of the outcome to predict next word or character • Unlabeled data can compensate for labeled fewer data in asset pricing .
  • 11.