SlideShare uma empresa Scribd logo
1 de 31
Natural language Processing (NLP)
and Transformer Models
Ding Li 2021.11
2
Use 100 ~ 1K dimensions to represent each word
Basic word embedding methods
• Word2vec (Google, 2013)
• Glove (Stanford, 2014)
• FastText (Facebook, 2016)
Continuous bag-of-words method (CBOW)
• Sliding window to select context words and center word
• Average context words as input to predict center word
• Self-supervised learning, mass corpus as training data
Python code
0
0
1
0
0
…
0
Input Word
one hot vector
1
1M
vocabularies
puppy
0.98
0.57
-0.31
…
1.62
1
100
dimensions
Word Embedding embedding
dimensions
One hot vector
size
One hot vector
size
3
Recurrent Neural Networks (RNN): keep information
Python code
want? response?
GRU help to preserve important information
Long Short-Term Memory (LSTM): same purpose
Name Entity Recognition
B: Token begins an entity I: Token is inside an entity O: Others
Sharon Floyd flew to Miami on Friday
B-per I-per O O B-geo O B-tim
4
Encoder and Decoder Structure
encoder
decoder
How are the results?
Wie sind die Ergebnisse?
Problem: as sequence size increases, performance decreases
Attention: Word Alignment
bottleneck
Retrieve
information step by
step with
disambiguation and
score it
Encode/Decode Attention: which key word is most relevant to query?
For languages with
different grammar
structures, attention
still looks at the
correct token between
them
Sampling for next word
Greedy decoding: select the most probable word at each step
Beam search: a brooder, more exploratory decoding alternative
Minimum Bayes Risk: compare many samples against each other,
select sample with the highest similarity
Python code
Info loss
Key (K)
Query
(!Q)
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 =
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄𝐾𝑇
𝑉
𝑄𝐾𝑇
Q: linear transformed
from output
K, V: linear transformed
from input
5
RNN: calculation must happen in sequence
Positional Encoding: add positional info to words
Transformer: parallel computing for all words Multi-headed Attention
Causal Attention (Self-Attention)
• Queries and Keys are words from the same sentence
• Queries should only be allowed to look at words before
• Find words deserve more attention
linear transformation
• Each head uses
different linear
transformations to
represent words
• Different heads can
learn different
relationships
between words
Transformer Decoder
Python code
Online Summarization Tool
transformers GitHub
6
Create the query Q , key K, and value V
by multiplying the input matrix X, with weight matrics Wq, Wk, and Wv
Self Attention The meaning of a word can come from other words in sentence:
7
Bidirectional Encoder Representations from Transformers)
Transfer Learning
Pre-training (base model 110M parameters, large model 340M)
Pre-training basic model with massive data
Fine-turning models for different applications
Mask Language Modeling (MLM)
Next Sentence Prediction (NSP)
The legislators
believed that they
were on the right
side of history.
So they changed the law.
Then the bunny ate the carrot.
Pre-training data
• Books Corpus (800M words)
• English Wikipedia (2,500M words, ~13G)
Fine-turning and Data Input
Pre-training Sentence A Sentence B
Input Result
MLM, NSP
Classification Text None Sentiment pos/neg?
Grammar correct?
Question
Answering
Question Passage Answer or location in passage
Summary Article Summary Summary of the article
Natural Language
Inference
Hypothesis Premise Entailment, contraction, neutral?
Natural language
inference is the task of
determining whether a
“hypothesis” is true
(entailment), false
(contradiction), or
undetermined
(neutral) given a
“premise”.
Name Entity
Recognition
Sentence Entities Entities and tags
Paraphrase Sentence Paraphrase Paraphrase of the sentence
Bert GitHub Python
Paper 2019
8
The paper uses the Medical Information Mart
for Intensive Care III (MIMIC-III) dataset.
MIMIC-III consists of the electronic health records
of 58,976 unique hospital admissions from 38,597
patients in the intensive care unit of the Beth
Israel Deaconess Medical Center between 2001
and 2012. There are 2,083,180 de-identified notes
associated with the admissions.
ClinicalBERT accurately predicts 30-day
readmission using discharge summaries.
AUROC: Area under the receiver operating characteristic curve
AUPRC: Area under the precision-recall curve
PR80: Recall at precision of 80%
ClinicalBert paper
BioBert: trained with PubMed abstracts (PubMed) and/or PubMed Central full-text articles (PMC) GitHub
9
Text-to-Text Transfer Transformer)
Unified Multi-Task Framework: Text as Input, Text as Output
Cola: Corpus of Linguistic Acceptability
STSB: Semantic Textual Similarity Benchmark
RTE: Recognizing Textual Entailment
MNLI: Multi-Genre Natural Language Inference
MRPC: Microsoft Research Paraphrase Corpus
SQuAD: Stanford Question Answering Dataset
WMT English to German
COPA: Choice of Plausible Alternatives, causal reasoning
MultiRC: Multi-Sentence Reading Comprehension
WiC
Word in Context
WSC: Winograd Schema Challenge, resolve ambiguity
The city councilmen refused the demonstrators a permit
because they [feared/advocated] violence.
Question: “they” refers to?
Transfer Learning with C4 – Colossal Cleaned Crawl Corpus (~800G), base model with 220M parameters, large model 770M, largest 11B
T5 GitHub Paper 2020 Python
10
Language Model Meta-Learning
Larger Models Make Increasingly Efficient Use of In-
Context Information
paper
Datasets Used to Train GPT-3
Model Size ~ TriviaQA Performance
SAT Analogies (65% ~ avg applications 57%)
11
paper
12
Gu 2021 models
Less domain
vocabulary
More domain
vocabulary
1
3
Model Model Full Name Vocabulary Training Size
BERT bert-base-uncased Wiki + Books 16G
RoBERTa roberta-base Web Crawl 160G
PubMedBert microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext PubMed 21G
14
Self-Alignment Pretraining for Biomedical Entity Representations SapBert GitHub Model
Liu 2021
Figure 1: The t-SNE visualization of UMLS entities under PUBMEDBERT (BERT pretrained on
PubMed papers) & PUBMEDBERT+ SAPBERT (PUBMEDBERT further pretrained on UMLS
synonyms). The biomedical names of different concepts are hard to separate in the
heterogeneous embedding space (left). After the self-alignment pretraining, the same
concept’s entity names are drawn closer to form compact clusters (right).
Pertaining with UMLS (Unified Medical Language System)
4M+ concepts & 10M+ synonyms (MeSH, SNOMED, RxNorm, Gene Ontology, & OMIM)
Hard Pairs Mining (𝑥𝑎, 𝑥𝑝, 𝑥𝑛)
𝑥𝑎: anchor; 𝑥𝑝: positive synonym match; 𝑥𝑛 : negative synonym match
Only consider triplets with the negative sample closer to the positive sample by a margin of λ.
Loss Function
S: similarity matrix among 𝜒𝑏 items in batch b
Negative pair similarity
should be small
Positive pair similarity
should be large
15
Radford 2021 GitHub
16
Colab
• We demonstrate that the simple pre-training
task of predicting which caption goes with
which image is an efficient and scalable way
to learn SOTA image representations from
scratch on a dataset of 400 million (image,
text) pairs collected from the internet.
• After pre-training, natural language is used to
reference learned visual concepts (or describe
new ones) enabling zero-shot transfer of the
model to downstream tasks.
• We study the performance of this approach by
benchmarking on over 30 different existing
computer vision datasets, spanning tasks such
as OCR, action recognition in videos, geo-
localization, and many types of fine-grained
object classification.
• The model transfers non-trivially to most tasks
and is often competitive with a fully
supervised baseline without the need for any
dataset specific training.
Blog
17
Masked Autoencoders (MAE) Are Scalable Vision Learners He 2021
Figure 1. Our MAE architecture. During pre-training, a large random
subset of image patches (e.g., 75%) is masked out. The encoder is
applied to the small subset of visible patches. Mask tokens are
introduced after the encoder, and the full set of encoded patches and
mask tokens is processed by a small decoder that reconstructs the
original image in pixels. After pre-training, the decoder is discarded,
and the encoder is applied to uncorrupted images to produce
representations for recognition tasks.
Figure 4. Reconstructions of ImageNet validation images using an MAE
pre-trained with a masking ratio of 75% but applied on inputs with
higher masking ratios. The predictions differ plausibly from the original
images, showing that the method can generalize.
18
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Baevski 2022
GitHub
19
 Self-supervised learning makes all human’s text as machine’s potential training data.
 Machines are not only trained with text’s meaning and semantics, but also reasoning.
 Models with billions of parameters are increasing their sophisticated capabilities fast.
20
 Coursera
Natural Language Processing Specialization
Applied Text Mining in Python
 Books
Getting Started with Google BERT
 Papers
Attention Is All You Need
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Pretrained Transformers for Text Ranking: BERT and Beyond
PASSAGE RE-RANKING WITH BERT
 Blogs
Illustrated: Self-Attention
Natural language inference
Keyword Extraction: from TF-IDF to BERT
Understanding searches better than ever before
 Projects
NLP-progress
Bert Extractive Summarizer
 Colab
A Visual Notebook to Using BERT for the First Time (blog)
2
1
1. Count word frequency in all training tweets
Word in
All Tweets
Counts in
Positive Tweets
Counts in
Negative Tweets
Happy 305 87
Hard 66 217
NLP 34 29
Learning 18 13
2. Sum the frequency for each tweet
Tweets
Counts in
Positive Tweets
X1
Counts in
Negative Tweets
X2
Happy learning 323
305 + 18
100
87 + 13
NLP hard 101
35 + 66
246
29 + 217
3. Regression and Sigmoid
𝑧 = 𝜃0 + 𝜃1𝑋1 + 𝜃2𝑋2
ℎ(𝑧) =
1
1 + 𝑒−𝑧
Update Ѳ to minimize the difference between h and label
4. Predict results with optimized parameters
positive
negative
Python code
Issue:
Information from single words are partially lost in summation
2
2
1. Bayes’ Rule
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 ∩ ℎ𝑎𝑝𝑝𝑦 = 𝑃 ℎ𝑎𝑝𝑝𝑦 × 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦 = 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 × 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦 = 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 ×
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃 ℎ𝑎𝑝𝑝𝑦
2. 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =
𝑓𝑟𝑒𝑞 𝑤𝑖,𝐶𝑙𝑎𝑠𝑠
𝑁𝐶𝑙𝑎𝑠𝑠
=
2
13
3. Laplacian Smoothing to handle zero values
𝑃 𝑤𝑖 | 𝐶𝑙𝑎𝑠𝑠 =
𝑓𝑟𝑒𝑞 𝑤𝑖, 𝐶𝑙𝑎𝑠𝑠 + 1
𝑁𝐶𝑙𝑎𝑠𝑠 + 𝑉
𝑁𝐶𝑙𝑎𝑠𝑠: frequency of all words of a class V: number of unique words in vocabulary
4. Log Likelihood
𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =
2 + 1
13 + 8
= 0.14
Doc: I am happy learning NLP
𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = log
0.50
0.50
+ log
0.19
0.19
+ log
0.19
0.19
+ log
0.14
0.10
+ log
0.10
0.10
+ log
0.10
0.10
= 0 + 0 + 0 + 0.146 + 0 + 0 = 0.146 > 0 positive Python code
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦
𝑃 𝑁𝑒𝑔𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦
=
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃 𝑁𝑒𝑔𝑡𝑖𝑣𝑒
×
𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑁𝑒𝑔𝑡𝑖𝑣𝑒
𝑝𝑟𝑖𝑜𝑟 𝑟𝑎𝑡𝑖𝑜
Word
Pos
counts
Neg
counts
p(w|pos) p(w|neg)
I 3 3 0.19 0.19
am 3 3 0.19 0.19
happy 2 1 0.14 0.10
because 1 0 0.10 0.05
learning 1 1 0.10 0.10
NLP 1 1 0.10 0.10
sad 1 2 0.10 0.14
not 1 2 0.10 0.14
Nclass 13 13
Issue:
Words distribution are calculated without context
2
3
N-Gram and Probability [Corpus: I am happy because I am learning]
• Unigram: {I, am, happy, because, learning} 𝑃 𝐼 =
𝐶 𝐼
𝐶(𝐴𝑙𝑙)
=
2
7
• Bigram: {I am, am happy, happy because…} 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑎𝑚 =
𝐶 𝑎𝑚 ℎ𝑎𝑝𝑝𝑦
𝐶 𝑎𝑚
=
1
2
• Trigram: {I am happy, am happy because…} 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝐼 𝑎𝑚 =
𝐶 𝐼 𝑎𝑚 ℎ𝑎𝑝𝑝𝑦
𝐶 𝐼 𝑎𝑚
=
1
2
Approximation of Sequence Probability
• Use N-Gram for approximation since long sequences are rare
Use Bigram: 𝑃 𝑡ℎ𝑒 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎 ≈ 𝑃 𝑡ℎ𝑒 𝑃 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑡ℎ𝑒 𝑃 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑃 𝑡𝑒𝑎 𝑑𝑟𝑖𝑛𝑘𝑠
• Interpolation for handle missing terms
Trigram: 𝑃 𝑡𝑒𝑎|𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 ≈ 0.7 × 𝑃 𝑡𝑒𝑎 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 + 0.2 × 𝑃 𝑡𝑒𝑎 𝑑𝑟𝑖𝑛𝑘𝑠 + 0.1 × 𝑃(tea)
• Add start and end token of sentence: <s> the teacher drinks tea </s>
𝑃 𝑡ℎ𝑒 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎 ≈ 𝑃 𝑡ℎ𝑒| < 𝑠 > 𝑃 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑡ℎ𝑒 𝑃 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑃 𝑡𝑒𝑎 𝑑𝑟𝑖𝑛𝑘𝑠 P(</s>|tea)
Probability Matrix [Corpus: I study I learn]
Applications
Auto Complete
Generative Text
Python code
24
TF -IDF (Term Frequency – Inverse Document Frequency)
tf = frequency of a term in a document
𝑖𝑑𝑓 = log
𝑁𝑎𝑙𝑙
𝑁𝑡
,
tf - idf = 𝑡𝑓 × 𝑖𝑑𝑓 = 𝑡𝑓 × log
𝑁
𝑁𝑡
Wikipedia TF-IDF Dataset Release
Nall: total articles
𝑁𝑡: articles with term t
Term Nt Nall idf
the 5,457,533 5,989,879 0.09
disease 67,085 5.989,879 4.49
encephalitis 904 5,989,879 8.80
TextRank (based on graph of co-occurrence words)
Important words are
surrounded by other
important words.
Word distance: 2 ~ 10
Similar to PageRank
Python Lib: Summa
YAKE (Yet Another Keyword Extractor)
Paper published in 2020
Jellyfish package is used to calculate word distance
KeyBERT (Keyword Extraction with BERT)
SentenceTransformer: word embedding for article and keywords
Supported Pretrained Models:
• stsb-roberta-large 1.31G
• nli-roberta-large 1.31G
• distilbert-base-nli-mean-tokens 244M
0 1 2 3 4 5 6 … 1023
Article 1.35 0.98 -0.34 0.94 -0.17 1.38 -0.07 … 1.09
Key 1 0.04 -0.22 -0.87 0.92 0.82 1.15 0.14 … 1.71
Then compute the similarity between the article and keywords
Setence meaning can
be pooled from:
• [CLS]
• Mean of all words
• Max of all words
2
5
 File Operation
f = open(filename, mode) f.close()
f.readline() f.read(n) f.write(message)
for line in f: do_something(line)
df = pd.read_csv(filename) df.to_csv(filename)
 Extract Text from HTML File
from bs4 import BeautifulSoup
html_soup = BeautifulSoup(html_str, 'html.parser')
html_text = html_soup.get_text()
 Contraction Expansion
“can’t” → “cannot”; “We’re” → “We are”
Regular expression pattern substitution
 Word Comparison
s.startwith(t) s.endswith(t)
t in s
s.isupper() s.islower() s.istitle()
s.isalpha() s.isdigit() s.isalmum()
 String Operations
s.lower() s.upper() s.titlecase()
s.split(t) s.splitlines() s.join(t)
s.strip() s.rstrip()
s.find(t) s.rfint(t) s.replace(u,v)
 Regular Expression
import re
Remove punctuation: re.sub(r'[^ws]',‘’,s) w: word characters, s: whitespace
Find call out: re.search(‘@[A-Za-z0-9_]+’, s) re.search(@[w]+, s)
 Remove Stop Words [NLTK: Natural Language Toolkit]
from nltk.corpus import stopwords
nltk.download()
stop = stopwords.words('english')
" ".join(x for x in s.split() if x not in stop)
 Tokenization
nltk.word_tokenize(text)
nltk.sent_tokenize(text)
 Stemming
“fish”, “fishing”, “fishes” → “fish”, “leaves” → “leav”
porter = nltk.PorterStemmer()
porter.stem(‘fishing’)
 Lemmatization
“good”, “better”, “best” → “good”, “leaves” → “leaf”
lemma = nltk.WordNetLemmatizer()
lemma.lemmatize(‘leaves’)
 Part of Speech (POS) Tagging
nltk.pos_tag()
2
6
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import torch
import transformers as ppb
df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv',
delimiter='t', header=None)
model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased’)
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
tokenized = df[0].apply((lambda x: tokenizer.encode(x, add_special_tokens=True)))
max_len = max(len(x) for x in tokenized.values)
padded = np.array([i + [0] * (max_len - len(i)) for i in tokenized.values])
attention_mask = np.where(padded != 0, 1, 0)
input_ids = torch.tensor(padded).to(torch.int64)
attention_mask = torch.tensor(attention_mask).to(torch.int64)
with torch.no_grad():
last_hidden_states = model(input_ids, attention_mask=attention_mask)
features = last_hidden_states[0][:, 0, :].numpy()
labels = df[1]
train_features, test_features, train_labels, test_labels = train_test_split(features, labels)
lr_clf = LogisticRegression()
lr_clf.fit(train_features, train_labels)
lr_clf.score(test_features, test_labels)
df
tokenized
Sentence embedding in [CLS]
Logistic regression is applied to the 768 embedding values of each
sentence to decide its sentiment classification.
Result: 0.86
Colab
27
Context-based Embedding
Sentence A: He got bit by Python.
Sentence B: Python is my favorite programming language.
BERT Configurations
L (# of encoders) A (attention heads) H (hidden units)
Bert-base 12 12 768
Bert-large 24 16 1024
BERT uses Wordpiece Tokenizer
"Let us start pretraining the model."
tokens = [let, us, start, pre, ##train, ##ing, the, model]
Masked Language Model
The feedforward network
takes representation of masked
token as input and returns the
probability of all the words in
our vocabulary to be the
masked word
28
Sentiment Analysis Natural Language Inference Name Entity Recognition
Hugging Face transformers documentation
29
Paragraph = "The immune system is a system of many
biological structures and processes within an organism
that protects against disease. To function properly, an
immune system must detect a wide variety of agents,
known as pathogens, from viruses to parasitic worms, and
distinguish them from the organism's own healthy tissue."
Question = "What is the immune system?"
Answer = "a system of many biological structures and
processes within an organism that protects against disease"
30
Extractive summarization
• Pick important sentences from a text.
• Add [CLS] to represent each sentences and judge
whether the sentence should be included.
Abstractive summarization
• Paraphrasing the given text and holding
essential meaning.
Fine-tune BERT for Extractive Summarization Text Summarization with Pretrained Encoders
3
1
monoBERT
The final representation of the token is fed to a fully-connected layer that
produces the [CLS] relevance score s of that text with respect to the query.
Birch

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language Models
 
LLaMA 2.pptx
LLaMA 2.pptxLLaMA 2.pptx
LLaMA 2.pptx
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Word embedding
Word embedding Word embedding
Word embedding
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Text Classification
Text ClassificationText Classification
Text Classification
 

Semelhante a Natural language processing and transformer models

download
downloaddownload
download
butest
 
download
downloaddownload
download
butest
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
jcscholtes
 

Semelhante a Natural language processing and transformer models (20)

Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Tensorflowv5.0
Tensorflowv5.0Tensorflowv5.0
Tensorflowv5.0
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
DeepPavlov 2019
DeepPavlov 2019DeepPavlov 2019
DeepPavlov 2019
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Chatbot_Presentation
Chatbot_PresentationChatbot_Presentation
Chatbot_Presentation
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
download
downloaddownload
download
 
download
downloaddownload
download
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 

Mais de Ding Li

Mais de Ding Li (13)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Titanic survivor prediction by machine learning
Titanic survivor prediction by machine learningTitanic survivor prediction by machine learning
Titanic survivor prediction by machine learning
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 

Último

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Natural language processing and transformer models

  • 1. Natural language Processing (NLP) and Transformer Models Ding Li 2021.11
  • 2. 2 Use 100 ~ 1K dimensions to represent each word Basic word embedding methods • Word2vec (Google, 2013) • Glove (Stanford, 2014) • FastText (Facebook, 2016) Continuous bag-of-words method (CBOW) • Sliding window to select context words and center word • Average context words as input to predict center word • Self-supervised learning, mass corpus as training data Python code 0 0 1 0 0 … 0 Input Word one hot vector 1 1M vocabularies puppy 0.98 0.57 -0.31 … 1.62 1 100 dimensions Word Embedding embedding dimensions One hot vector size One hot vector size
  • 3. 3 Recurrent Neural Networks (RNN): keep information Python code want? response? GRU help to preserve important information Long Short-Term Memory (LSTM): same purpose Name Entity Recognition B: Token begins an entity I: Token is inside an entity O: Others Sharon Floyd flew to Miami on Friday B-per I-per O O B-geo O B-tim
  • 4. 4 Encoder and Decoder Structure encoder decoder How are the results? Wie sind die Ergebnisse? Problem: as sequence size increases, performance decreases Attention: Word Alignment bottleneck Retrieve information step by step with disambiguation and score it Encode/Decode Attention: which key word is most relevant to query? For languages with different grammar structures, attention still looks at the correct token between them Sampling for next word Greedy decoding: select the most probable word at each step Beam search: a brooder, more exploratory decoding alternative Minimum Bayes Risk: compare many samples against each other, select sample with the highest similarity Python code Info loss Key (K) Query (!Q) 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄𝐾𝑇 𝑉 𝑄𝐾𝑇 Q: linear transformed from output K, V: linear transformed from input
  • 5. 5 RNN: calculation must happen in sequence Positional Encoding: add positional info to words Transformer: parallel computing for all words Multi-headed Attention Causal Attention (Self-Attention) • Queries and Keys are words from the same sentence • Queries should only be allowed to look at words before • Find words deserve more attention linear transformation • Each head uses different linear transformations to represent words • Different heads can learn different relationships between words Transformer Decoder Python code Online Summarization Tool transformers GitHub
  • 6. 6 Create the query Q , key K, and value V by multiplying the input matrix X, with weight matrics Wq, Wk, and Wv Self Attention The meaning of a word can come from other words in sentence:
  • 7. 7 Bidirectional Encoder Representations from Transformers) Transfer Learning Pre-training (base model 110M parameters, large model 340M) Pre-training basic model with massive data Fine-turning models for different applications Mask Language Modeling (MLM) Next Sentence Prediction (NSP) The legislators believed that they were on the right side of history. So they changed the law. Then the bunny ate the carrot. Pre-training data • Books Corpus (800M words) • English Wikipedia (2,500M words, ~13G) Fine-turning and Data Input Pre-training Sentence A Sentence B Input Result MLM, NSP Classification Text None Sentiment pos/neg? Grammar correct? Question Answering Question Passage Answer or location in passage Summary Article Summary Summary of the article Natural Language Inference Hypothesis Premise Entailment, contraction, neutral? Natural language inference is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”. Name Entity Recognition Sentence Entities Entities and tags Paraphrase Sentence Paraphrase Paraphrase of the sentence Bert GitHub Python Paper 2019
  • 8. 8 The paper uses the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. MIMIC-III consists of the electronic health records of 58,976 unique hospital admissions from 38,597 patients in the intensive care unit of the Beth Israel Deaconess Medical Center between 2001 and 2012. There are 2,083,180 de-identified notes associated with the admissions. ClinicalBERT accurately predicts 30-day readmission using discharge summaries. AUROC: Area under the receiver operating characteristic curve AUPRC: Area under the precision-recall curve PR80: Recall at precision of 80% ClinicalBert paper BioBert: trained with PubMed abstracts (PubMed) and/or PubMed Central full-text articles (PMC) GitHub
  • 9. 9 Text-to-Text Transfer Transformer) Unified Multi-Task Framework: Text as Input, Text as Output Cola: Corpus of Linguistic Acceptability STSB: Semantic Textual Similarity Benchmark RTE: Recognizing Textual Entailment MNLI: Multi-Genre Natural Language Inference MRPC: Microsoft Research Paraphrase Corpus SQuAD: Stanford Question Answering Dataset WMT English to German COPA: Choice of Plausible Alternatives, causal reasoning MultiRC: Multi-Sentence Reading Comprehension WiC Word in Context WSC: Winograd Schema Challenge, resolve ambiguity The city councilmen refused the demonstrators a permit because they [feared/advocated] violence. Question: “they” refers to? Transfer Learning with C4 – Colossal Cleaned Crawl Corpus (~800G), base model with 220M parameters, large model 770M, largest 11B T5 GitHub Paper 2020 Python
  • 10. 10 Language Model Meta-Learning Larger Models Make Increasingly Efficient Use of In- Context Information paper Datasets Used to Train GPT-3 Model Size ~ TriviaQA Performance SAT Analogies (65% ~ avg applications 57%)
  • 12. 12 Gu 2021 models Less domain vocabulary More domain vocabulary
  • 13. 1 3 Model Model Full Name Vocabulary Training Size BERT bert-base-uncased Wiki + Books 16G RoBERTa roberta-base Web Crawl 160G PubMedBert microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext PubMed 21G
  • 14. 14 Self-Alignment Pretraining for Biomedical Entity Representations SapBert GitHub Model Liu 2021 Figure 1: The t-SNE visualization of UMLS entities under PUBMEDBERT (BERT pretrained on PubMed papers) & PUBMEDBERT+ SAPBERT (PUBMEDBERT further pretrained on UMLS synonyms). The biomedical names of different concepts are hard to separate in the heterogeneous embedding space (left). After the self-alignment pretraining, the same concept’s entity names are drawn closer to form compact clusters (right). Pertaining with UMLS (Unified Medical Language System) 4M+ concepts & 10M+ synonyms (MeSH, SNOMED, RxNorm, Gene Ontology, & OMIM) Hard Pairs Mining (𝑥𝑎, 𝑥𝑝, 𝑥𝑛) 𝑥𝑎: anchor; 𝑥𝑝: positive synonym match; 𝑥𝑛 : negative synonym match Only consider triplets with the negative sample closer to the positive sample by a margin of λ. Loss Function S: similarity matrix among 𝜒𝑏 items in batch b Negative pair similarity should be small Positive pair similarity should be large
  • 16. 16 Colab • We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. • After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. • We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo- localization, and many types of fine-grained object classification. • The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. Blog
  • 17. 17 Masked Autoencoders (MAE) Are Scalable Vision Learners He 2021 Figure 1. Our MAE architecture. During pre-training, a large random subset of image patches (e.g., 75%) is masked out. The encoder is applied to the small subset of visible patches. Mask tokens are introduced after the encoder, and the full set of encoded patches and mask tokens is processed by a small decoder that reconstructs the original image in pixels. After pre-training, the decoder is discarded, and the encoder is applied to uncorrupted images to produce representations for recognition tasks. Figure 4. Reconstructions of ImageNet validation images using an MAE pre-trained with a masking ratio of 75% but applied on inputs with higher masking ratios. The predictions differ plausibly from the original images, showing that the method can generalize.
  • 18. 18 data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language Baevski 2022 GitHub
  • 19. 19  Self-supervised learning makes all human’s text as machine’s potential training data.  Machines are not only trained with text’s meaning and semantics, but also reasoning.  Models with billions of parameters are increasing their sophisticated capabilities fast.
  • 20. 20  Coursera Natural Language Processing Specialization Applied Text Mining in Python  Books Getting Started with Google BERT  Papers Attention Is All You Need BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing Pretrained Transformers for Text Ranking: BERT and Beyond PASSAGE RE-RANKING WITH BERT  Blogs Illustrated: Self-Attention Natural language inference Keyword Extraction: from TF-IDF to BERT Understanding searches better than ever before  Projects NLP-progress Bert Extractive Summarizer  Colab A Visual Notebook to Using BERT for the First Time (blog)
  • 21. 2 1 1. Count word frequency in all training tweets Word in All Tweets Counts in Positive Tweets Counts in Negative Tweets Happy 305 87 Hard 66 217 NLP 34 29 Learning 18 13 2. Sum the frequency for each tweet Tweets Counts in Positive Tweets X1 Counts in Negative Tweets X2 Happy learning 323 305 + 18 100 87 + 13 NLP hard 101 35 + 66 246 29 + 217 3. Regression and Sigmoid 𝑧 = 𝜃0 + 𝜃1𝑋1 + 𝜃2𝑋2 ℎ(𝑧) = 1 1 + 𝑒−𝑧 Update Ѳ to minimize the difference between h and label 4. Predict results with optimized parameters positive negative Python code Issue: Information from single words are partially lost in summation
  • 22. 2 2 1. Bayes’ Rule 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 ∩ ℎ𝑎𝑝𝑝𝑦 = 𝑃 ℎ𝑎𝑝𝑝𝑦 × 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦 = 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 × 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦 = 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 × 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃 ℎ𝑎𝑝𝑝𝑦 2. 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = 𝑓𝑟𝑒𝑞 𝑤𝑖,𝐶𝑙𝑎𝑠𝑠 𝑁𝐶𝑙𝑎𝑠𝑠 = 2 13 3. Laplacian Smoothing to handle zero values 𝑃 𝑤𝑖 | 𝐶𝑙𝑎𝑠𝑠 = 𝑓𝑟𝑒𝑞 𝑤𝑖, 𝐶𝑙𝑎𝑠𝑠 + 1 𝑁𝐶𝑙𝑎𝑠𝑠 + 𝑉 𝑁𝐶𝑙𝑎𝑠𝑠: frequency of all words of a class V: number of unique words in vocabulary 4. Log Likelihood 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = 2 + 1 13 + 8 = 0.14 Doc: I am happy learning NLP 𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = log 0.50 0.50 + log 0.19 0.19 + log 0.19 0.19 + log 0.14 0.10 + log 0.10 0.10 + log 0.10 0.10 = 0 + 0 + 0 + 0.146 + 0 + 0 = 0.146 > 0 positive Python code 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦 𝑃 𝑁𝑒𝑔𝑡𝑖𝑣𝑒 | ℎ𝑎𝑝𝑝𝑦 = 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃 𝑁𝑒𝑔𝑡𝑖𝑣𝑒 × 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑁𝑒𝑔𝑡𝑖𝑣𝑒 𝑝𝑟𝑖𝑜𝑟 𝑟𝑎𝑡𝑖𝑜 Word Pos counts Neg counts p(w|pos) p(w|neg) I 3 3 0.19 0.19 am 3 3 0.19 0.19 happy 2 1 0.14 0.10 because 1 0 0.10 0.05 learning 1 1 0.10 0.10 NLP 1 1 0.10 0.10 sad 1 2 0.10 0.14 not 1 2 0.10 0.14 Nclass 13 13 Issue: Words distribution are calculated without context
  • 23. 2 3 N-Gram and Probability [Corpus: I am happy because I am learning] • Unigram: {I, am, happy, because, learning} 𝑃 𝐼 = 𝐶 𝐼 𝐶(𝐴𝑙𝑙) = 2 7 • Bigram: {I am, am happy, happy because…} 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝑎𝑚 = 𝐶 𝑎𝑚 ℎ𝑎𝑝𝑝𝑦 𝐶 𝑎𝑚 = 1 2 • Trigram: {I am happy, am happy because…} 𝑃 ℎ𝑎𝑝𝑝𝑦 | 𝐼 𝑎𝑚 = 𝐶 𝐼 𝑎𝑚 ℎ𝑎𝑝𝑝𝑦 𝐶 𝐼 𝑎𝑚 = 1 2 Approximation of Sequence Probability • Use N-Gram for approximation since long sequences are rare Use Bigram: 𝑃 𝑡ℎ𝑒 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎 ≈ 𝑃 𝑡ℎ𝑒 𝑃 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑡ℎ𝑒 𝑃 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑃 𝑡𝑒𝑎 𝑑𝑟𝑖𝑛𝑘𝑠 • Interpolation for handle missing terms Trigram: 𝑃 𝑡𝑒𝑎|𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 ≈ 0.7 × 𝑃 𝑡𝑒𝑎 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 + 0.2 × 𝑃 𝑡𝑒𝑎 𝑑𝑟𝑖𝑛𝑘𝑠 + 0.1 × 𝑃(tea) • Add start and end token of sentence: <s> the teacher drinks tea </s> 𝑃 𝑡ℎ𝑒 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎 ≈ 𝑃 𝑡ℎ𝑒| < 𝑠 > 𝑃 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑡ℎ𝑒 𝑃 𝑑𝑟𝑖𝑛𝑘𝑠 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑃 𝑡𝑒𝑎 𝑑𝑟𝑖𝑛𝑘𝑠 P(</s>|tea) Probability Matrix [Corpus: I study I learn] Applications Auto Complete Generative Text Python code
  • 24. 24 TF -IDF (Term Frequency – Inverse Document Frequency) tf = frequency of a term in a document 𝑖𝑑𝑓 = log 𝑁𝑎𝑙𝑙 𝑁𝑡 , tf - idf = 𝑡𝑓 × 𝑖𝑑𝑓 = 𝑡𝑓 × log 𝑁 𝑁𝑡 Wikipedia TF-IDF Dataset Release Nall: total articles 𝑁𝑡: articles with term t Term Nt Nall idf the 5,457,533 5,989,879 0.09 disease 67,085 5.989,879 4.49 encephalitis 904 5,989,879 8.80 TextRank (based on graph of co-occurrence words) Important words are surrounded by other important words. Word distance: 2 ~ 10 Similar to PageRank Python Lib: Summa YAKE (Yet Another Keyword Extractor) Paper published in 2020 Jellyfish package is used to calculate word distance KeyBERT (Keyword Extraction with BERT) SentenceTransformer: word embedding for article and keywords Supported Pretrained Models: • stsb-roberta-large 1.31G • nli-roberta-large 1.31G • distilbert-base-nli-mean-tokens 244M 0 1 2 3 4 5 6 … 1023 Article 1.35 0.98 -0.34 0.94 -0.17 1.38 -0.07 … 1.09 Key 1 0.04 -0.22 -0.87 0.92 0.82 1.15 0.14 … 1.71 Then compute the similarity between the article and keywords Setence meaning can be pooled from: • [CLS] • Mean of all words • Max of all words
  • 25. 2 5  File Operation f = open(filename, mode) f.close() f.readline() f.read(n) f.write(message) for line in f: do_something(line) df = pd.read_csv(filename) df.to_csv(filename)  Extract Text from HTML File from bs4 import BeautifulSoup html_soup = BeautifulSoup(html_str, 'html.parser') html_text = html_soup.get_text()  Contraction Expansion “can’t” → “cannot”; “We’re” → “We are” Regular expression pattern substitution  Word Comparison s.startwith(t) s.endswith(t) t in s s.isupper() s.islower() s.istitle() s.isalpha() s.isdigit() s.isalmum()  String Operations s.lower() s.upper() s.titlecase() s.split(t) s.splitlines() s.join(t) s.strip() s.rstrip() s.find(t) s.rfint(t) s.replace(u,v)  Regular Expression import re Remove punctuation: re.sub(r'[^ws]',‘’,s) w: word characters, s: whitespace Find call out: re.search(‘@[A-Za-z0-9_]+’, s) re.search(@[w]+, s)  Remove Stop Words [NLTK: Natural Language Toolkit] from nltk.corpus import stopwords nltk.download() stop = stopwords.words('english') " ".join(x for x in s.split() if x not in stop)  Tokenization nltk.word_tokenize(text) nltk.sent_tokenize(text)  Stemming “fish”, “fishing”, “fishes” → “fish”, “leaves” → “leav” porter = nltk.PorterStemmer() porter.stem(‘fishing’)  Lemmatization “good”, “better”, “best” → “good”, “leaves” → “leaf” lemma = nltk.WordNetLemmatizer() lemma.lemmatize(‘leaves’)  Part of Speech (POS) Tagging nltk.pos_tag()
  • 26. 2 6 import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import torch import transformers as ppb df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='t', header=None) model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased’) tokenizer = tokenizer_class.from_pretrained(pretrained_weights) model = model_class.from_pretrained(pretrained_weights) tokenized = df[0].apply((lambda x: tokenizer.encode(x, add_special_tokens=True))) max_len = max(len(x) for x in tokenized.values) padded = np.array([i + [0] * (max_len - len(i)) for i in tokenized.values]) attention_mask = np.where(padded != 0, 1, 0) input_ids = torch.tensor(padded).to(torch.int64) attention_mask = torch.tensor(attention_mask).to(torch.int64) with torch.no_grad(): last_hidden_states = model(input_ids, attention_mask=attention_mask) features = last_hidden_states[0][:, 0, :].numpy() labels = df[1] train_features, test_features, train_labels, test_labels = train_test_split(features, labels) lr_clf = LogisticRegression() lr_clf.fit(train_features, train_labels) lr_clf.score(test_features, test_labels) df tokenized Sentence embedding in [CLS] Logistic regression is applied to the 768 embedding values of each sentence to decide its sentiment classification. Result: 0.86 Colab
  • 27. 27 Context-based Embedding Sentence A: He got bit by Python. Sentence B: Python is my favorite programming language. BERT Configurations L (# of encoders) A (attention heads) H (hidden units) Bert-base 12 12 768 Bert-large 24 16 1024 BERT uses Wordpiece Tokenizer "Let us start pretraining the model." tokens = [let, us, start, pre, ##train, ##ing, the, model] Masked Language Model The feedforward network takes representation of masked token as input and returns the probability of all the words in our vocabulary to be the masked word
  • 28. 28 Sentiment Analysis Natural Language Inference Name Entity Recognition Hugging Face transformers documentation
  • 29. 29 Paragraph = "The immune system is a system of many biological structures and processes within an organism that protects against disease. To function properly, an immune system must detect a wide variety of agents, known as pathogens, from viruses to parasitic worms, and distinguish them from the organism's own healthy tissue." Question = "What is the immune system?" Answer = "a system of many biological structures and processes within an organism that protects against disease"
  • 30. 30 Extractive summarization • Pick important sentences from a text. • Add [CLS] to represent each sentences and judge whether the sentence should be included. Abstractive summarization • Paraphrasing the given text and holding essential meaning. Fine-tune BERT for Extractive Summarization Text Summarization with Pretrained Encoders
  • 31. 3 1 monoBERT The final representation of the token is fed to a fully-connected layer that produces the [CLS] relevance score s of that text with respect to the query. Birch