SlideShare uma empresa Scribd logo
1 de 55
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to bootstrap image
classification and question answering (QA)
Danielle Dean PhD, Wee Hyong Tok PhD
Principal Data Scientist Lead
Microsoft
@danielleodean | @weehyong
Inspired by “Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense” , Mark Russinovich, RSA Conference 2018
Textbook ML development
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Fact | Industry grade ML solutions are highly exploratory
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Attempt 1 Attempt 2 Attempt 3
Attempt 4 Attempt n
Traditional versus Transfer learning
Learning
system
Learning
system
Learning
system
Different tasks
Traditional Machine Learning Transfer Learning
Source tasks
Learning
system
Target task
Source: "A survey on transfer learning." , Pan, Sinno Jialin, and Qiang Yang. IEEE Transactions on knowledge and data engineering
Why are we talking about transfer learning ?
Commercial
success
Time 2016
Supervised
learning
Transfer
learning
Unsupervised
learning
Reinforcement
learning
Drivers of ML success in industry
Source: “Transfer Learning - Machine Learning's Next Frontier” , Ruder, Sebastian,
Transfer Learning in Computer Vision
Can we leverage knowledge of processing images to help with new
tasks?
• What’s in the picture?
• Where is the bike located?
• Can you find a similar bike?
• How many bikes are there?
Before Deep Learning
• Researchers took a traditional machine learning approach
• Manual creation of a variety of different visual feature extractors
• Followed by traditional ML classifiers
• Features not very generalizable to other vision tasks – not easy to transfer
• Example: HoG Detectors
- Histogram of oriented
gradients (HoG) features
- Sliding window detector
- SVM Classifier
- Very fast OpenCV
implementation (<100ms)
Deep Neural Networks
14,197,122 images
21841 synsets
Diverse images, Lots of labels!
Transfer Learning for Computer Vision
Train a model
using data from
ImageNet Retail
Manufacturing
Deep Learning
Model for
Computer
Vision
Apply the
model to
other domains
Example – Visualizing the different layers
Source: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Another fun site:
https://deepart.io/nips/submissions/random/
http://cs231n.stanford.edu/
Example – Visualizing the different layers
Source: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Check out these sites -
https://deepart.io/nips/submissions/random/
http://cs231n.stanford.edu/
Clothing texture dataset:
• 1716 images from Bing which were manually annotated
Striped
Argyle
Dotted
Transfer Learning – How to get started?
Type How to Initialize
Featurization
Layers
Output
Layer
Initialization
How is Transfer Learning
used?
How to Train?
Standard DNN Random Random None Train featurization and output
jointly
Headless DNN Learn using
another task
Separate ML
algorithm
Use the features learned
on a related task
Use the features to train a
separate classifier
Fine Tune DNN Learn using
another task
Random Use and fine tune
features learned on a
related task
Train featurization and output
jointly with a small learning rate
Multi-Task DNN Random Random Learned features need to
solve many related tasks
Share a featurization network
across both tasks. Train all
networks jointly with a loss
function (sum of individual task
loss function)
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
cat? YES
dog? NO
car? NO
Classi
fier
e.g.
SVM
dotted?
Complex
Objects &
Scenes
(people, animals,
cars, beach
scene, etc.)
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
Outputs of penultimate layer of ImageNet Trained CNN
provide excellent general purpose image features
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Using a pre-trained DNN, an accurate
model can be achieved with thousands (or
less) of labeled examples instead of millions
cat? YES
dog? NO
car? NO
dotted?
Train one or more
layers in new network
Transfer Learning Results - Texture Dataset
DNN featurization
Input Image Size: 224x224 pixels
Area Under Curve: 0.59
Classification Accuracy: 69.0%
Fine-tuning (full CNN)
Input Image Size: 224x224 pixels
Area Under Curve: 0.76
Classification Accuracy: 77.4%
Fine-tuning (full CNN)
Input Image Size: 896x886 pixels
Area Under Curve: 0.83
Classification Accuracy: 88.2%
Transfer Learning for Similarity
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
•Hymenoptera, 2 classes and 397 images.
•Simpsons, 20 classes (subset of total) and 19548 images.
•Dogs vs Cats, 2 classes and 25000 images.
•Caltech 256, 257 classes and 30607 images.
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
0
5000
10000
15000
20000
25000
30000
35000
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Dataset
Hymenoptera
Dataset
Hymenoptera
gray
Dataset
Simpsons
Dataset
Simpsons gray
Dataset Dogs
vs Cats
Dataset Dogs
vs Cats gray
Dataset
Caltech256
Dataset
Caltech256
gray
Val. accuracy finetuning Val. accuracy freezing # of images
Aerial Use Classification ESmart – Connected Drone Jabil – Defect Inspection
Example Applications in Computer Vision
Lung Cancer Detection
Distributed deep domain
adaptation for automated
poacher detection
https://github.com/MattKleinsmith/void-detector
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Label Noise
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Traditional Method: Manual Verification
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Applying Transfer Learning
Computer Vision is not a “solved problem”
The knowledge being “transferred” can be very useful but not the same as
how humans learn to see
Recap: Transfer Learning for Image Classification
Define the
Learning Task
Identify a pre-
trained model
Decide whether to
further fine-tune
or use it as a
headless DNN
Freeze top layers,
re-train the
classifier
Validate the model
Deploy the model
Audio Spectrograms
Images
Rich, high-dimensional datasets
Rich, high-dimensional datasets
Text
Spare data (depends on the encoding)I s e e a b I g c a t
Deep Learning on Different Types of Data
How do we apply
Transfer Learning to NLP?
Different Type of NLP Tasks
And many more….
Transfer Learning for Text
Define the
Learning Task
Identify a pre-
trained model
Decide whether to
further fine-tune
Freeze top layers,
re-train the
classifier
Validate the model
Deploy the model
What does the top
layer encode?
What kind of pre-
trained model?
Word Embeddings
Male - Female Verb Tense Country - Capital
Source: Tensorflow Tutorial - https://www.tensorflow.org/tutorials/representation/word2vec
Word Embeddings
2013 2014-2015 2017
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
embeddings_index = {}
f = open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'))
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
Compute an index
mapping words to
known embeddings
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
Compute Embedding
Matrix
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
from keras.layers import Embedding
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
Load the Embedding
Matrix into an
Embedding Layer
Prevent weights from being
updated during training
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(len(labels_index), activation='softmax')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_val, y_val),
epochs=2, batch_size=128)
Build a small 1D
convnet to solve the
classification problem
From initializing the first layers to pre-
training the entire model
(and learning higher level semantic concepts)
Transfer Learning for NLP - ULMFiT
Source: Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, ACL 2018
Train a Language Model
using Large General
Domain Corpus
Fine-tune the
Language Model
Fine-tune Classifier
Transfer Learning for NLP - ELMo
Source: Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer., NAACL 2018
ELMo ELMo ELMo
have a nice
Corpus
Train
biLMs
Enhancing
Inputs with ELMos
Usual
Inputs
ELMo Pre-trained Models
Source: https://allennlp.org/elmo
Using ELMo with TensorFlow Hub
Source: https://www.tensorflow.org/hub/modules/google/elmo/2
elmo = hub.Module("https://tfhub.dev/google/elmo/2",
trainable=True)
embeddings = elmo(
["the cat is on the mat", "dogs are in the fog"],
signature="default",
as_dict=True)["elmo"]
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
inputs={
"tokens": tokens_input,
"sequence_len": tokens_length
},
signature="tokens",
as_dict=True)["elmo"]
ELMo
Untokenized Sentences
Tokens
Or Dictionary
• Character-based word representation
• First LSTM Hidden State
• Second LSTM Hidden State
• elmo (weighted sum of 3 layers)
• Fixed mean-pooling of contextualized
word representation
Transfer Learning for MRC tasks
Source:
Transfer Learning for Machine Reading Comprehension - https://bit.ly/2Cmiffy
Transfer Learning for MRC
Train a model
using data from
WikiPedia
News Articles
Customer Support Data
MRC
Model Apply the
model to
other domains
SQUAD
Stanford Question Answering Dataset (SQuAD)
Reading comprehension dataset
Based on Wikipedia articles
Crowdsource questions
Answer is Text Segment, or span, from
the corresponding reading passage, or the no
answers found.
Question Answer Pairs
MRC Datasets
Transfer Learning for MRC using SynNet
Train using a large
MRC Dataset (e.g.
SQuAD)
Apply the pre-
trained model to a
new domain (e.g.
NewsQA)
Validate
the model
Deploy the model
Transfer Learning for MRC –Survey - https://bit.ly/2JAt1h0
More comparisons between different MRC Approaches
SynNet
Stage 1- Answer Synthesis module
uses a bi-directional LSTM to predict
IOB tags on the input paragraph.
Marks out semantic concept that are
likely answer
Stage 2 – Question Synthesis module
uses a uni-directional LSTM to
generate the questions
Source: ACL 2017, https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/
SynNet – Question/Answer Generation Example
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to
bootstrap image classification and
question answering (QA)
Summary
1. Transfer Learning and
Applications
2. How to use Transfer Learning for
Image Classification
3. How to use Transfer Learning for
NLP tasks
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to
bootstrap image classification and
question answering (QA)
Danielle Dean PhD, Wee Hyong Tok PhD
Principal Data Scientist Lead
Microsoft
@danielleodean | @weehyong
Thank You!

Mais conteúdo relacionado

Mais procurados

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Our research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software EngineeringOur research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software EngineeringJordi Cabot
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...Edge AI and Vision Alliance
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature surveyAkshay Hegde
 
Academia to Data Science - A Hitchhiker's Guide
Academia to Data Science - A Hitchhiker's GuideAcademia to Data Science - A Hitchhiker's Guide
Academia to Data Science - A Hitchhiker's GuideSudeep Das, Ph.D.
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Andrea Valente
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialXavier Amatriain
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 

Mais procurados (11)

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Our research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software EngineeringOur research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software Engineering
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Academia to Data Science - A Hitchhiker's Guide
Academia to Data Science - A Hitchhiker's GuideAcademia to Data Science - A Hitchhiker's Guide
Academia to Data Science - A Hitchhiker's Guide
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 

Semelhante a OReilly AI Transfer Learning

Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
DotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NETDotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NETAlberto Diaz Martin
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfAubainYro1
 
Easy path to machine learning (2023-2024)
Easy path to machine learning (2023-2024)Easy path to machine learning (2023-2024)
Easy path to machine learning (2023-2024)wesley chun
 
Production ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google CloudProduction ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google Cloudgdgsurrey
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial IntelligenceDavid Chou
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdfvkharish18
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTKAshish Jaiman
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
 
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5gdgsurrey
 

Semelhante a OReilly AI Transfer Learning (20)

Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
NEXiDA at OMG June 2009
NEXiDA at OMG June 2009NEXiDA at OMG June 2009
NEXiDA at OMG June 2009
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
DotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NETDotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NET
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Easy path to machine learning (2023-2024)
Easy path to machine learning (2023-2024)Easy path to machine learning (2023-2024)
Easy path to machine learning (2023-2024)
 
Production ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google CloudProduction ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google Cloud
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial Intelligence
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

OReilly AI Transfer Learning

  • 1. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Danielle Dean PhD, Wee Hyong Tok PhD Principal Data Scientist Lead Microsoft @danielleodean | @weehyong Inspired by “Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense” , Mark Russinovich, RSA Conference 2018
  • 2. Textbook ML development Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions
  • 3. Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Fact | Industry grade ML solutions are highly exploratory Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt n
  • 4. Traditional versus Transfer learning Learning system Learning system Learning system Different tasks Traditional Machine Learning Transfer Learning Source tasks Learning system Target task Source: "A survey on transfer learning." , Pan, Sinno Jialin, and Qiang Yang. IEEE Transactions on knowledge and data engineering
  • 5. Why are we talking about transfer learning ? Commercial success Time 2016 Supervised learning Transfer learning Unsupervised learning Reinforcement learning Drivers of ML success in industry Source: “Transfer Learning - Machine Learning's Next Frontier” , Ruder, Sebastian,
  • 6. Transfer Learning in Computer Vision Can we leverage knowledge of processing images to help with new tasks? • What’s in the picture? • Where is the bike located? • Can you find a similar bike? • How many bikes are there?
  • 7. Before Deep Learning • Researchers took a traditional machine learning approach • Manual creation of a variety of different visual feature extractors • Followed by traditional ML classifiers • Features not very generalizable to other vision tasks – not easy to transfer • Example: HoG Detectors - Histogram of oriented gradients (HoG) features - Sliding window detector - SVM Classifier - Very fast OpenCV implementation (<100ms)
  • 9. 14,197,122 images 21841 synsets Diverse images, Lots of labels!
  • 10. Transfer Learning for Computer Vision Train a model using data from ImageNet Retail Manufacturing Deep Learning Model for Computer Vision Apply the model to other domains
  • 11. Example – Visualizing the different layers Source: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/ Another fun site: https://deepart.io/nips/submissions/random/ http://cs231n.stanford.edu/
  • 12. Example – Visualizing the different layers Source: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/ Check out these sites - https://deepart.io/nips/submissions/random/ http://cs231n.stanford.edu/
  • 13. Clothing texture dataset: • 1716 images from Bing which were manually annotated Striped Argyle Dotted
  • 14.
  • 15. Transfer Learning – How to get started? Type How to Initialize Featurization Layers Output Layer Initialization How is Transfer Learning used? How to Train? Standard DNN Random Random None Train featurization and output jointly Headless DNN Learn using another task Separate ML algorithm Use the features learned on a related task Use the features to train a separate classifier Fine Tune DNN Learn using another task Random Use and fine tune features learned on a related task Train featurization and output jointly with a small learning rate Multi-Task DNN Random Random Learned features need to solve many related tasks Share a featurization network across both tasks. Train all networks jointly with a loss function (sum of individual task loss function)
  • 16. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped cat? YES dog? NO car? NO Classi fier e.g. SVM dotted? Complex Objects & Scenes (people, animals, cars, beach scene, etc.) Low-Level Features (lines, edges, color fields, etc.) High-Level Features (corners, contours, simple shapes) Object Parts (wheels, faces, windows, etc.) Outputs of penultimate layer of ImageNet Trained CNN provide excellent general purpose image features
  • 17. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped Using a pre-trained DNN, an accurate model can be achieved with thousands (or less) of labeled examples instead of millions cat? YES dog? NO car? NO dotted? Train one or more layers in new network
  • 18. Transfer Learning Results - Texture Dataset DNN featurization Input Image Size: 224x224 pixels Area Under Curve: 0.59 Classification Accuracy: 69.0% Fine-tuning (full CNN) Input Image Size: 224x224 pixels Area Under Curve: 0.76 Classification Accuracy: 77.4% Fine-tuning (full CNN) Input Image Size: 896x886 pixels Area Under Curve: 0.83 Classification Accuracy: 88.2%
  • 19. Transfer Learning for Similarity
  • 20.
  • 21. Full code: https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb •Hymenoptera, 2 classes and 397 images. •Simpsons, 20 classes (subset of total) and 19548 images. •Dogs vs Cats, 2 classes and 25000 images. •Caltech 256, 257 classes and 30607 images.
  • 25. Aerial Use Classification ESmart – Connected Drone Jabil – Defect Inspection Example Applications in Computer Vision Lung Cancer Detection Distributed deep domain adaptation for automated poacher detection
  • 27.
  • 28. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Label Noise
  • 29. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Traditional Method: Manual Verification
  • 30. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Applying Transfer Learning
  • 31. Computer Vision is not a “solved problem” The knowledge being “transferred” can be very useful but not the same as how humans learn to see
  • 32. Recap: Transfer Learning for Image Classification Define the Learning Task Identify a pre- trained model Decide whether to further fine-tune or use it as a headless DNN Freeze top layers, re-train the classifier Validate the model Deploy the model
  • 33. Audio Spectrograms Images Rich, high-dimensional datasets Rich, high-dimensional datasets Text Spare data (depends on the encoding)I s e e a b I g c a t Deep Learning on Different Types of Data
  • 34. How do we apply Transfer Learning to NLP?
  • 35. Different Type of NLP Tasks And many more….
  • 36. Transfer Learning for Text Define the Learning Task Identify a pre- trained model Decide whether to further fine-tune Freeze top layers, re-train the classifier Validate the model Deploy the model What does the top layer encode? What kind of pre- trained model?
  • 37. Word Embeddings Male - Female Verb Tense Country - Capital Source: Tensorflow Tutorial - https://www.tensorflow.org/tutorials/representation/word2vec
  • 39. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html embeddings_index = {} f = open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')) for line in f: values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs f.close() Compute an index mapping words to known embeddings embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM)) for word, i in word_index.items(): embedding_vector = embeddings_index.get(word) if embedding_vector is not None: # words not found in embedding index will be all-zeros. embedding_matrix[i] = embedding_vector Compute Embedding Matrix
  • 40. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html from keras.layers import Embedding embedding_layer = Embedding(len(word_index) + 1, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=False) Load the Embedding Matrix into an Embedding Layer Prevent weights from being updated during training
  • 41. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32') embedded_sequences = embedding_layer(sequence_input) x = Conv1D(128, 5, activation='relu')(embedded_sequences) x = MaxPooling1D(5)(x) x = Conv1D(128, 5, activation='relu')(x) x = MaxPooling1D(5)(x) x = Conv1D(128, 5, activation='relu')(x) x = MaxPooling1D(35)(x) # global max pooling x = Flatten()(x) x = Dense(128, activation='relu')(x) preds = Dense(len(labels_index), activation='softmax')(x) model = Model(sequence_input, preds) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc']) model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=2, batch_size=128) Build a small 1D convnet to solve the classification problem
  • 42. From initializing the first layers to pre- training the entire model (and learning higher level semantic concepts)
  • 43. Transfer Learning for NLP - ULMFiT Source: Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, ACL 2018 Train a Language Model using Large General Domain Corpus Fine-tune the Language Model Fine-tune Classifier
  • 44. Transfer Learning for NLP - ELMo Source: Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer., NAACL 2018 ELMo ELMo ELMo have a nice Corpus Train biLMs Enhancing Inputs with ELMos Usual Inputs
  • 45. ELMo Pre-trained Models Source: https://allennlp.org/elmo
  • 46. Using ELMo with TensorFlow Hub Source: https://www.tensorflow.org/hub/modules/google/elmo/2 elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True) embeddings = elmo( ["the cat is on the mat", "dogs are in the fog"], signature="default", as_dict=True)["elmo"] elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True) tokens_input = [["the", "cat", "is", "on", "the", "mat"], ["dogs", "are", "in", "the", "fog", ""]] tokens_length = [6, 5] embeddings = elmo( inputs={ "tokens": tokens_input, "sequence_len": tokens_length }, signature="tokens", as_dict=True)["elmo"] ELMo Untokenized Sentences Tokens Or Dictionary • Character-based word representation • First LSTM Hidden State • Second LSTM Hidden State • elmo (weighted sum of 3 layers) • Fixed mean-pooling of contextualized word representation
  • 47. Transfer Learning for MRC tasks Source: Transfer Learning for Machine Reading Comprehension - https://bit.ly/2Cmiffy
  • 48. Transfer Learning for MRC Train a model using data from WikiPedia News Articles Customer Support Data MRC Model Apply the model to other domains
  • 49. SQUAD Stanford Question Answering Dataset (SQuAD) Reading comprehension dataset Based on Wikipedia articles Crowdsource questions Answer is Text Segment, or span, from the corresponding reading passage, or the no answers found. Question Answer Pairs
  • 51. Transfer Learning for MRC using SynNet Train using a large MRC Dataset (e.g. SQuAD) Apply the pre- trained model to a new domain (e.g. NewsQA) Validate the model Deploy the model Transfer Learning for MRC –Survey - https://bit.ly/2JAt1h0 More comparisons between different MRC Approaches
  • 52. SynNet Stage 1- Answer Synthesis module uses a bi-directional LSTM to predict IOB tags on the input paragraph. Marks out semantic concept that are likely answer Stage 2 – Question Synthesis module uses a uni-directional LSTM to generate the questions Source: ACL 2017, https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/
  • 53. SynNet – Question/Answer Generation Example
  • 54. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Summary 1. Transfer Learning and Applications 2. How to use Transfer Learning for Image Classification 3. How to use Transfer Learning for NLP tasks
  • 55. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Danielle Dean PhD, Wee Hyong Tok PhD Principal Data Scientist Lead Microsoft @danielleodean | @weehyong Thank You!