NLP with Deep Learning Guest Lecture slides by Fatih Mehmet Güler, PragmaCraft. Includes my background on the subject, our projects, the NLP stages and the latest developments.
22. SRL with LSTM Paper
• End-to-end Learning of Semantic Role Labeling Using
Recurrent Neural Networks
• Jie Zhou and Wei Xu, 2015 (Baidu Research)
•
23. Word Vectors
• Word2Vec
• CBOW: predict the word by the context
• several times faster to train than the skip-gram, slightly better accuracy for the
frequent words
• Skip-Gram: predict the context by the word
• works well with small amount of the training data, represents well even rare
words or phrases
• Glove: Count-based model that learn vectors by essentially doing dimensionality
reduction on the co-occurrence counts matrix
• Elmo
• BERT
24.
25.
26. ELMO
• Bidirectional LSTM Language Model
• Dynamic Word Embedding
• Embedding changes according to the context
27. BERT
• Replaces language modeling with “masked language
modeling”
• Words in a sentence are randomly erased and replaced
with a special token (“masked”) with some small
probability, 15%.
• Then, a Transformer is used to generate a prediction for
the masked word based on the unmasked words
surrounding it, both to the left and right.
30. Practical Applications
• Frameworks
• PyTorch
• TensorFlow
• Keras
• More High Level
• AllenNLP
• spaCy
• Flair, PyText, Torchtext
• Problems
• Unknown Word: Byte Pair Encoding - Sentence Piece
• LSTM Long Sequence Problem
31. What’s Next?
• More Variants of Elmo/BERT - Transfer Learning
• More NLP Applications - Embeddings all the way
• My Unsolicited Advice :)
• deeplearning.ai (course 5 - sequence models)
• read lots of papers (http://arxiv-sanity.com)
• twitter & facebook (!)
• Andrew Ng, Yann Lecun, Andrej Karpathy