SlideShare uma empresa Scribd logo
1 de 228
Baixar para ler offline
Deep Learning Applications
(in industries and elsewhere)
Abhishek Thakur
@abhi1thakur
About me
● Chief Data Scientist @ Boost AI
● Machine learning enthusiast
● Kaggle junkie (highest world rank #3)
● Interested in:
○ Automatic machine learning
○ Large scale classification of text data
○ Chatbots
I like big data
and
I cannot lie
Agenda
● Brief introduction to deep learning
● Implementation of deepnets
● Fine-tuning of pre-trained networks
● 4 different industrial use cases
● No maths!!!!
What is deep learning?
What is deep learning?
● A buzzword
What is deep learning?
● A buzzword
● Neural networks
What is deep learning?
● A buzzword
● Neural networks
● Removes manual feature extraction steps
What is deep learning?
● A buzzword
● Neural networks
● Removes manual feature extraction steps
● Not a black box
How have convnets evolved?
1989
How have convnets evolved?
2012
How have convnets evolved?
How have convnets evolved?
2014
What can deep learning do?
What can deep learning do?
What can deep learning do?
What can deep learning do?
What can deep learning do?
What can deep learning do?
● Natural language processing
What can deep learning do?
● Natural language processing
● Speech processing
What can deep learning do?
● Natural language processing
● Speech processing
● Computer vision
● And more and more
How can I implement my own DeepNets?
How can I implement my own DeepNets?
● Implement them on your own
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
● Save yourself some time and finetune
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
● Save yourself some time and finetune
○ Convert data
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
● Save yourself some time and finetune
○ Convert data
○ Define net
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
● Save yourself some time and finetune
○ Convert data
○ Define net
○ Define solver
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
● Save yourself some time and finetune
○ Convert data
○ Define net
○ Define solver
○ Train
How can I implement my own DeepNets?
● Implement them on your own
○ Decompose into smaller parts
○ Implement layers
○ Start training
● Save yourself some time and finetune
○ Convert data
○ Define net
○ Define solver
○ Train
● Caffe (caffe.berkeleyvision.org)
● Keras (www.keras.io)
Caffe
Caffe
● Speed
Caffe
● Speed
● Openness
Caffe
● Speed
● Openness
● Modularity
Caffe
● Speed
● Openness
● Modularity
● Expression - No coding knowledge? No problem!
Caffe
● Speed
● Openness
● Modularity
● Expression - No coding knowledge? No problem!
● Community
What do you need for Caffe?
What do you need for Caffe?
● Convert data
What do you need for Caffe?
● Convert data
● Define a network (prototxt)
What do you need for Caffe?
● Convert data
● Define a network (prototxt)
● Define a solver (prototxt)
What do you need for Caffe?
● Convert data
● Define a network (prototxt)
● Define a solver (prototxt)
● Train the network (with or without pre-trained weights)
Prototxt
● solver.prototxt
Prototxt
● train.prototxt
Prototxt
● train_val.prototxt
Training a net using Caffe
Training a net using Caffe
/PATH_TO_CAFFE/caffe train --solver=solver.prototxt
Fine Tuning!
● Fine tuning using GoogleNet
● Why?
○ It has Google in its name
○ It won ILSVRC 2014
○ It’s complicated and I wanted to play with it
● Caffe model zoo offers a lot of pretrained nets, including GoogleNet
● Model Zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo
Honey Bee vs. Bumble Bee
Tougher Than
Honey Bee vs. Bumble Bee
The Metis Challenge: Naive Bees Classifier @ Drivendata.Org
An initial model
An initial model
Steps to finetune
Steps to finetune
● Create training and test files
Steps to finetune
● Create training and test files
● Get the prototxt files from model zoo
Steps to finetune
● Create training and test files
● Get the prototxt files from model zoo
● Modify them
Steps to finetune
● Create training and test files
● Get the prototxt files from model zoo
● Modify them
● Run the caffe solver
Generating training and validation sets
Generating training and validation sets
Changes in train_val.prototxt
Changes in train_val.prototxt
Changes in train_val.prototxt
Changes in train_val.prototxt
Changes in solver.prototxt
Changes in solver.prototxt
Changes in solver.prototxt
And that’s all
Finetune your network
/PATH_TO_CAFFE/caffe train -solver ./solver.prototxt -weights
./models/bvlc_googlenet.caffemodel
Did the net learn something new?
Did the net learn something new?
Did the net learn something new?
Did the net learn something new?
Breaking down the various layers of GoogLeNet
Random
Pretrained
Finetuned
● inception_3a
● inception_3b
● inception_4a
● inception_4b
● inception_4c
● inception_4d
● inception_4e
● inception_5a
● inception_5b
Why finetune?
74
Why finetune?
● It is faster
Why finetune?
● It is faster
● It is better (most of the times)
Why finetune?
● It is faster
● It is better (most of the times)
● Why reinvent the wheel?
Tell me how to train a deepnet in Python!
Tell me how to train a deepnet in Python!
● Caffe has a python interface
Tell me how to train a deepnet in Python!
● Caffe has a python interface
● Tensorflow
Tell me how to train a deepnet in Python!
● Caffe has a python interface
● Tensorflow
● Theano
Tell me how to train a deepnet in Python!
● Caffe has a python interface
● Tensorflow
● Theano
● Lasagne
Tell me how to train a deepnet in Python!
● Caffe has a python interface
● Tensorflow
● Theano
● Lasagne
● Keras
Tell me how to train a deepnet in Python!
● Caffe has a python interface
● Tensorflow
● Theano
● Lasagne
● Keras
● Neon
Tell me how to train a deepnet in Python!
● Caffe has a python interface
● Tensorflow
● Theano
● Lasagne
● Keras
● Neon
● And lots more…..
Classifying Search Queries
Why classify search queries?
● For businesses
○ Find out user-intent
○ Track keywords according to transactional buying cycle of user
○ Optimize website content and focus on smaller keyword set
Why classify search queries?
● For business
○ Find out user-intent
○ Track keywords according to transactional buying cycle of user
○ Optimize website content and focussing on smaller keyword set
● For data scientists
○ 100s of millions of unlabeled keywords to play with
○ Why Not!
Word2Vec in Search Queries
Word2Vec in Search Queries
Word2Vec in Search Queries
Feeding Data to LSTMs
the white house
Feeding Data to LSTMs
the white house
Feeding Data to LSTMs
the white house
Feeding Data to LSTMs
the white house
Feeding Data to LSTMs
the white house
Sequence for LSTM
Feeding Data to LSTMs
the white house
Sequence for LSTM
Feeding Data to LSTMs
the white house
Sequence for LSTM
❖ United States
❖ President
❖ Politician
❖ Washington
❖ Lawyer
❖ Secretary
Performance of the Network
Navigational Queries
Transactional
Queries
Informational
Queries
Awareness
Decision
Evaluation
Retention
Representing Queries as Images
David Villa
Word2Vec
representations of
the top search
result titles
Apple juice
Irish
I don’t see much
difference!
Guild Wars or Apple Juice
Machine Learning Models
● Boosted trees
○ Word2vec embeddings
○ Titles from top results
○ Additional features of the SERP page
○ TF-IDF
○ XGBoost!!!! (https://github.com/dmlc/xgboost)
Machine Learning Models
● Convolutional Neural Networks:
○ Using images directly
○ Using random crops from the image
Machine Learning Models
● Convolutional Neural Networks:
○ Using images directly
○ Using random crops from the image
Convolutional Neural Network
Machine Learning Models
● Convolutional Neural Networks:
○ Using images directly
○ Using random crops from the image
Convolutional Neural Network
Convolutional Neural Network
Neural Networks with Keras
Convolutional Neural Network
https://github.com/fchollet/keras
Neural Networks with Keras
Convolutional Neural Network
Approaching “any” ML problem
Approaching “any” ML problem
AutoCompete: A Framework for Machine Learning Competitions, A.Thakur and A Krohn-Grimberghe, ICML AutoML Workshop, 2015
Optimizing neural networks
Optimizing neural networks
AutoML Challenge: Rules for tuning Neural Networks, A.Thakur, ICML AutoML Workshop, System Desc Track, 2016
Selecting NNet Architecture
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
● Add new layer:
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
● Add new layer:
○ 1200-1500 neurons
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
● Add new layer:
○ 1200-1500 neurons
○ High dropout: 40-50%
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
● Add new layer:
○ 1200-1500 neurons
○ High dropout: 40-50%
● Very big network:
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
● Add new layer:
○ 1200-1500 neurons
○ High dropout: 40-50%
● Very big network:
○ 8000-10000 neurons in each layer
Selecting NNet Architecture
● Always use SGD or Adam (for fast convergence)
● Start low:
○ Single layer with 120-500 neurons
○ Batch normalization + ReLU
○ Dropout: 10-20%
● Add new layer:
○ 1200-1500 neurons
○ High dropout: 40-50%
● Very big network:
○ 8000-10000 neurons in each layer
○ 60-80% dropout
The AutoML Challenge
Some Results
AutoML Final1 Results
Some Results
AutoML Final4 Results
Some Results
AutoML GPU Track
Results
@abhi1thakur
10 Things You Didn’t Know About Clickbaits!
What are clickbaits?
● 10 things Apple didn’t tell you about the new iPhone
● What happened next will surprise you
● This is what the actor/actress from 90s looks like now
● What did Donald Trump just say about Obama and Clinton
● 9 things you must have to be a good data scientist
@abhi1thakur
What are clickbaits?
@abhi1thakur
What are clickbaits?
● Interesting titles
● Frustrating titles
● Seldomly good enough content
● Google penalizes clickbait content
● Facebook does the same
@abhi1thakur
The data
● Crawl buzzfeed, clickhole
● Crawl new york times, cnn
● ~10000 titles
○ Clickbaits: buzzfeed, clickhole
○ Non-clickbaits: new york times, cnn
○ ~5000 from either categories
@abhi1thakur
Good old TF-IDF
● Very powerful
● Used both character and word analyzers
@abhi1thakur
Some interesting words
@abhi1thakur
Some interesting words
@abhi1thakur
Let’s build some models
@abhi1thakur
Logistic Regression
@abhi1thakur
● ROC AUC Score = 0.987319021551
● Precision Score = 0.950326797386
● Recall Score = 0.939276485788
● F1 Score = 0.944769330734
XGBoost
@abhi1thakur
● ROC AUC Score = 0.969700677962
● Precision Score = 0.95756718529
● Recall Score = 0.874677002584
● F1 Score = 0.914247130317
Is that it?
● No!
● Model predictions:
○ “Donald Trump” : 15% Clickbait
○ “Barack Obama”: 80% Clickbait
● Something was very wrong!
● TF-IDF didn’t capture the meanings
@abhi1thakur
Word2Vec
● Shallow neural networks
● Generates vectors of high dimension for every word
● Every word gets a position in space
● Similar words cluster together
@abhi1thakur
Word2Vec
@abhi1thakur
XGBoost + W2V
@abhi1thakur
● ROC AUC Score = 0.981312768055
● Precision Score = 0.939947780679
● Recall Score = 0.93023255814
● F1 Score = 0.935064935065
Performance
● Fast to train
● Good results
@abhi1thakur
@abhi1thakur
Does word2vec capture everything?
Do we have all we need only from titles?
What if content of website isn’t clickbait-y?
@abhi1thakur
The data
● Crawl Buzzfeed, NYT, CNN, clickhole, etc.
● Too much work
● Simple models
● Doubts about results
● Crawl public Facebook pages:
○ Buzzfeed
○ CNN
○ The New York Times
○ Clickhole
○ StopClickBaitOfficial
○ Upworthy
○ Wikinews
Facebook page scrapper is available here:
https://github.com/minimaxir/facebook-page-post-scraper
@abhi1thakur
The data
● link_name (the title of the URL shared)
● status_type (whether it’s a link, photo or a video)
● status_link (the actual URL)
@abhi1thakur
@abhi1thakur
Data Processing
● Get the HTML content too
● Clean the mess up!
@abhi1thakur
Feature Generation
● Size of the HTML (in bytes)
● Length of HTML
● Total number of links
● Total number of buttons
● Total number of inputs
● Total number of unordered lists
● Total number of ordered lists
● Total number of lists (ordered +
unordered)
@abhi1thakur
● Total Number of H1 tags
● Total Number of H2 tags
● Full length of all text in all H1
tags that were found
● Full length of all text in all H2
tags that were found
● Total number of images
● Total number of html tags
● Number of unique html tags
More Features
● All H1 text
● All H2 text
● Meta description
@abhi1thakur
Feature Generation
@abhi1thakur
Number of lists
@abhi1thakur
Number of links
@abhi1thakur
Number of images
@abhi1thakur
Number of buttons
@abhi1thakur
Customary word clouds
@abhi1thakur
Clickbaits Non-Clickbaits
Final Features
@abhi1thakur
Deep Learning Models
● Simple LSTM
● Two dense layers
● Dropout + Batch Normalization
● Softmax Activation
@abhi1thakur
Deep Learning Models
@abhi1thakur
Deep Learning Models
@abhi1thakur
Deep Learning Models
@abhi1thakur
Results
@abhi1thakur
Detecting Duplicates in Quora Questions
The Problem
➢ ~ 13 million questions (as of March, 2017)
➢ Many duplicate questions
➢ Cluster and join duplicates together
➢ Remove clutter
➢ First public data release: 24th January, 2017
Duplicate Questions
➢ How does Quora quickly mark questions as needing improvement?
➢ Why does Quora mark my questions as needing improvement/clarification
before I have time to give it details? Literally within seconds…
➢ What practical applications might evolve from the discovery of the Higgs
Boson?
➢ What are some practical benefits of discovery of the Higgs Boson?
➢ Why did Trump win the Presidency?
➢ How did Donald Trump win the 2016 Presidential Election?
Non-Duplicate Questions
➢ Who should I address my cover letter to if I'm applying for a big company like
Mozilla?
➢ Which car is better from safety view?""swift or grand i10"".My first priority is
safety?
➢ Mr. Robot (TV series): Is Mr. Robot a good representation of real-life hacking
and hacking culture? Is the depiction of hacker societies realistic?
➢ What mistakes are made when depicting hacking in ""Mr. Robot"" compared
to real-life cybersecurity breaches or just a regular use of technologies?
➢ How can I start an online shopping (e-commerce) website?
➢ Which web technology is best suitable for building a big E-Commerce
website?
The Data
➢ 400,000+ pairs of questions
➢ Initially data was very skewed
➢ Negative samples from related questions
➢ Not real distribution on Quora’s website
➢ Noise exists (as usual)
https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
The Data
➢ 255045 negative samples (non-duplicates)
➢ 149306 positive samples (duplicates)
➢ 40% positive samples
The Data
➢ Average number characters in question1: 59.57
➢ Minimum number of characters in question1: 1
➢ Maximum number of characters in question1: 623
➢ Average number characters in question2: 60.14
➢ Minimum number of characters in question2: 1
➢ Maximum number of characters in question2: 1169
Basic Feature Engineering
➢ Length of question1
➢ Length of question2
➢ Difference in the two lengths
➢ Character length of question1 without spaces
➢ Character length of question2 without spaces
➢ Number of words in question1
➢ Number of words in question2
➢ Number of common words in question1 and question2
Basic Feature Engineering
➢ Basic feature set: fs-1
data['len_q1'] = data.question1.apply(lambda x: len(str(x)))
data['len_q2'] = data.question2.apply(lambda x: len(str(x)))
data['diff_len'] = data.len_q1 - data.len_q2
data['len_char_q1'] = data.question1.apply(lambda x: len(''.join(set(str(x).replace(' ', '')))))
data['len_char_q2'] = data.question2.apply(lambda x: len(''.join(set(str(x).replace(' ', '')))))
data['len_word_q1'] = data.question1.apply(lambda x: len(str(x).split()))
data['len_word_q2'] = data.question2.apply(lambda x: len(str(x).split()))
data['common_words'] = data.apply(lambda x:
len(set(str(x['question1']).lower().split()).intersection(set(str(x['question2']).lower().split()))), axis=1)
Fuzzy Features
➢ pip install fuzzywuzzy
➢ Uses Levenshtein distance
➢ QRatio
➢ WRatio
➢ Token set ratio
➢ Token sort ratio
➢ Partial token set ratio
➢ Partial token sort ratio
➢ etc. etc. etc.
https://github.com/seatgeek/fuzzywuzzy
Fuzzy Features
➢ Fuzzy feature set: fs-2
data['fuzz_qratio'] = data.apply(lambda x: fuzz.QRatio(str(x['question1']), str(x['question2'])), axis=1)
data['fuzz_WRatio'] = data.apply(lambda x: fuzz.WRatio(str(x['question1']), str(x['question2'])), axis=1)
data['fuzz_partial_ratio'] = data.apply(lambda x: fuzz.partial_ratio(str(x['question1']), str(x['question2'])), axis=1)
data['fuzz_partial_token_set_ratio'] = data.apply(lambda x: fuzz.partial_token_set_ratio(str(x['question1']), str(x['question2'])),
axis=1)
data['fuzz_partial_token_sort_ratio'] = data.apply(lambda x: fuzz.partial_token_sort_ratio(str(x['question1']),
str(x['question2'])), axis=1)
data['fuzz_token_set_ratio'] = data.apply(lambda x: fuzz.token_set_ratio(str(x['question1']), str(x['question2'])), axis=1)
data['fuzz_token_sort_ratio'] = data.apply(lambda x: fuzz.token_sort_ratio(str(x['question1']), str(x['question2'])), axis=1)
TF-IDF
➢ TF(t) = Number of times a term t appears in a document / Total number of
terms in the document
➢ IDF(t) = log(Total number of documents / Number of documents with term t in
it)
➢ TF-IDF(t) = TF(t) * IDF(t)
tfidf = TfidfVectorizer(min_df=3, max_features=None,
strip_accents='unicode', analyzer='word',token_pattern=r'w{1,}',
ngram_range=(1, 2), use_idf=1, smooth_idf=1, sublinear_tf=1,
stop_words = 'english')
SVD
➢ Latent semantic analysis
➢ scikit-learn version of SVD
➢ 120 components
svd = decomposition.TruncatedSVD(n_components=120)
xtrain_svd = svd.fit_transform(xtrain)
xtest_svd = svd.transform(xtest)
Fuzzy Features
➢ Also known as approximate string matching
➢ Number of “primitive” operations required to convert string to exact match
➢ Primitive operations:
○ Insertion
○ Deletion
○ Substitution
➢ Typically used for:
○ Spell checking
○ Plagiarism detection
○ DNA sequence matching
○ Spam filtering
A Combination of TF-IDF & SVD
➢ TF-IDF features: fs3-1
A Combination of TF-IDF & SVD
➢ TF-IDF features: fs3-2
A Combination of TF-IDF & SVD
➢ TF-IDF + SVD features: fs3-3
A Combination of TF-IDF & SVD
➢ TF-IDF + SVD features: fs3-4
A Combination of TF-IDF & SVD
➢ TF-IDF + SVD features: fs3-5
Word2Vec Features
➢ Multi-dimensional vector for all the words in any dictionary
➢ Always great insights
➢ Very popular in natural language processing tasks
➢ Google news vectors 300d
Word2Vec Features
➢ Representing words
➢ Representing sentences
def sent2vec(s):
words = str(s).lower().decode('utf-8')
words = word_tokenize(words)
words = [w for w in words if not w in stop_words]
words = [w for w in words if w.isalpha()]
M = []
for w in words:
M.append(model[w])
M = np.array(M)
v = M.sum(axis=0)
return v / np.sqrt((v ** 2).sum())
W2V Features: WMD
Kusner, M., Sun, Y., Kolkin, N. & Weinberger, K.. (2015). From Word Embeddings To Document Distances.
W2V Features: Skew
➢ Skew = 0 for normal distribution
➢ Skew > 0: more weight in left tail
W2V Features: Kurtosis
➢ 4th central moment over the square of variance
➢ Types:
○ Pearson
○ Fisher: subtract 3.0 from result such that result is 0 for normal distribution
W2V Features
➢ Word2Vec feature set: fs-4
scipy.spatial.distance
scipy.stats
minkowski
jaccard
manhattanbraycurtis
euclidean
cosine
canberra
kurtosisskew
Raw Word2Vec Vectors
https://www.kaggle.com/jeffd23/visualizing-word-vectors-with-t-sne
➢ Raw W2V feature set: fs-5
Features Snapshot
Feature Snapshot
Machine Learning Models
Machine Learning Models
➢ Logistic regression
➢ Xgboost
➢ 5 fold cross-validation
➢ Accuracy as a comparison metric (also, precision + recall)
➢ Why accuracy?
Results
Deep Learning
LSTM
➢ Long short term memory
➢ A type of RNN
➢ Learn long term dependencies
➢ Used two LSTM layers
1D CNN
➢ One dimensional convolutional layer
➢ Temporal convolution
➢ Simple to implement:
for i in range(sample_length):
y[i] = 0
for j in range(kernel_length):
y[i] += x[i-j] * h[j]
Embedding Layers
➢ Simple layer
➢ Converts indexes to vectors
➢ [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
Time Distributed Dense Layer
➢ TimeDistributed wrapper around dense layer
➢ TimeDistributed applies the layer to every temporal slice of input
➢ Followed by Lambda layer
➢ Implements “translation” layer used by Stephen Merity (keras snli model)
model1 = Sequential()
model1.add(Embedding(len(word_index) + 1,
300,
weights=[embedding_matrix],
input_length=40,
trainable=False))
model1.add(TimeDistributed(Dense(300, activation='relu')))
model1.add(Lambda(lambda x: K.sum(x, axis=1), output_shape=(300,)))
GloVe Embeddings
➢ Count based model
➢ Dimensionality reduction on co-occurrence counts matrix
➢ word-context matrix -> word-feature matrix
➢ Common Crawl
○ 840B tokens, 2.2M vocab, 300d vectors
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation
Basis of Deep Learning Model
➢ Keras-snli model: https://github.com/Smerity/keras_snli
Before Training DeepNets
➢ Tokenize data
➢ Convert text data to sequences
tk = text.Tokenizer(nb_words=200000)
max_len = 40
tk.fit_on_texts(list(data.question1.values) + list(data.question2.values.astype(str)))
x1 = tk.texts_to_sequences(data.question1.values)
x1 = sequence.pad_sequences(x1, maxlen=max_len)
x2 = tk.texts_to_sequences(data.question2.values.astype(str))
x2 = sequence.pad_sequences(x2, maxlen=max_len)
word_index = tk.word_index
Before Training DeepNets
➢ Initialize GloVe embeddings
embeddings_index = {}
f = open('data/glove.840B.300d.txt')
for line in tqdm(f):
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
Before Training DeepNets
➢ Create the embedding matrix
embedding_matrix = np.zeros((len(word_index) + 1, 300))
for word, i in tqdm(word_index.items()):
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
Final Deep Learning Model
Final Deep Learning Model
Model 1 and Model 2
model1 = Sequential()
model1.add(Embedding(len(word_index) + 1,
300,
weights=[embedding_matrix],
input_length=40,
trainable=False))
model1.add(TimeDistributed(Dense(300, activation='relu')))
model1.add(Lambda(lambda x: K.sum(x, axis=1),
output_shape=(300,)))
model2 = Sequential()
model2.add(Embedding(len(word_index) + 1,
300,
weights=[embedding_matrix],
input_length=40,
trainable=False))
model2.add(TimeDistributed(Dense(300, activation='relu')))
model2.add(Lambda(lambda x: K.sum(x, axis=1),
output_shape=(300,)))
Final Deep Learning Model
Model 3 and Model 4
Model 3 and Model 4
model3 = Sequential()
model3.add(Embedding(len(word_index) + 1,
300,
weights=[embedding_matrix],
input_length=40,
trainable=False))
model3.add(Convolution1D(nb_filter=nb_filter,
filter_length=filter_length,
border_mode='valid',
activation='relu',
subsample_length=1))
model3.add(Dropout(0.2))
.
.
.
model3.add(Dense(300))
model3.add(Dropout(0.2))
model3.add(BatchNormalization())
Final Deep Learning Model
Model 5 and Model 6
model5 = Sequential()
model5.add(Embedding(len(word_index) + 1, 300, input_length=40,
dropout=0.2))
model5.add(LSTM(300, dropout_W=0.2, dropout_U=0.2))
model6 = Sequential()
model6.add(Embedding(len(word_index) + 1, 300, input_length=40,
dropout=0.2))
model6.add(LSTM(300, dropout_W=0.2, dropout_U=0.2))
Final Deep Learning Model
Merged Model
Time to Train the DeepNet
➢ Total params: 174,913,917
➢ Trainable params: 60,172,917
➢ Non-trainable params: 114,741,000
➢ NVIDIA Titan X
Combined Results
The deep network was trained on
an NVIDIA TitanX and took
approximately 300 seconds for
each epoch and took 10-15 hours
to train. This network achieved
an accuracy of 0.848 (~0.85).
Improving Further
➢ Cleaning the text data, e.g correcting mis-spellings
➢ POS tagging
➢ Entity recognition
➢ Combining deepnet with traditional ML models
Conclusion & References
➢ The deepnet gives near state-of-the-art result
➢ BiMPM model accuracy: 88%
Some reference:
➢ Zhiguo Wang, Wael Hamza and Radu Florian. "Bilateral Multi-Perspective
Matching for Natural Language Sentences," (BiMPM)
➢ Matthew Honnibal. "Deep text-pair classification with Quora's 2017 question
dataset," 13 February 2017. Retreived at
https://explosion.ai/blog/quora-deep-text-pair-classification
➢ Bradley Pallen’s work:
https://github.com/bradleypallen/keras-quora-question-pairs
Natural Language
Processing
Pre-trained domain
knowledge
Classification of intent
Identify entities
(extracting information)
API
Analytics
Delegation to customer support
Delegation to back-end robots
INSTANT PROCESSING and END-TO-END AUTOMATION
Monitoring and AI training
Chat
Avatar
Text
(Speech)
Pre-defined replyEnquiry
Intent classificationPre-processing of enquiry
Stemming
Cross-language
Misspellings algorithm
1. Insurance
2. Vehicle
3. Car
4.Rules for practice driving
Conversation without API
You don’t need to adjust your car
insurance when practise driving with
a learner’s permit. In case of damage
it’s the supervisor with a full driver’s
license that shall write and sign the
insurance claim
Hey you, do you
knoww if my car
insruacne covers
practice driving??
Hi James, what’s the weather in
Berlin on Thursday?
Thursday’s forecast for Berlin is
partly sunny and mostly clouds.
Required value
- Location
Optional value
- Date
Conversation with API
Redirect to API
- Weather
Thank you!
Questions / Comments?
All The Code:
❖ github.com/abhishekkrthakur
Get in touch:
➢ E-mail: abhishek4@gmail.com
➢ LinkedIn: bit.ly/thakurabhishek
➢ Kaggle: kaggle.com/abhishek
➢ Twitter: @abhi1thakur
If everything fails, use Xgboost

Mais conteúdo relacionado

Mais procurados

On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidYufeng Guo
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to KerasJohn Ramey
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningAmanda Mackay (she/her)
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at ScaleIntel Nervana
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 

Mais procurados (20)

On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on Android
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine Learning
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 

Semelhante a Deep Learning Applications (dadada2017)

OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveOSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveNETWAYS
 
Pentester++
Pentester++Pentester++
Pentester++CTruncer
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabCloudxLab
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
GDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning WorkshopGDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning Workshopssuser540861
 
Python in Industry
Python in IndustryPython in Industry
Python in IndustryDharmit Shah
 
TonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on HadoopTonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on HadoopAnthony Hsu
 
Your first 5 PHP design patterns - ThatConference 2012
Your first 5 PHP design patterns - ThatConference 2012Your first 5 PHP design patterns - ThatConference 2012
Your first 5 PHP design patterns - ThatConference 2012Aaron Saray
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT Mia Chang
 
Services, tools & practices for a software house
Services, tools & practices for a software houseServices, tools & practices for a software house
Services, tools & practices for a software houseParis Apostolopoulos
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas BhagatOSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas BhagatNETWAYS
 
How to Choose a Deep Learning Framework
How to Choose a Deep Learning FrameworkHow to Choose a Deep Learning Framework
How to Choose a Deep Learning FrameworkNavid Kalaei
 
TDD in Python With Pytest
TDD in Python With PytestTDD in Python With Pytest
TDD in Python With PytestEddy Reyes
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Holden Karau
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Austin Ogilvie
 

Semelhante a Deep Learning Applications (dadada2017) (20)

OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveOSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
 
Pentester++
Pentester++Pentester++
Pentester++
 
Introduction to competitive machine learning
Introduction to competitive machine learningIntroduction to competitive machine learning
Introduction to competitive machine learning
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
GDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning WorkshopGDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning Workshop
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
TonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on HadoopTonY: Native support of TensorFlow on Hadoop
TonY: Native support of TensorFlow on Hadoop
 
Your first 5 PHP design patterns - ThatConference 2012
Your first 5 PHP design patterns - ThatConference 2012Your first 5 PHP design patterns - ThatConference 2012
Your first 5 PHP design patterns - ThatConference 2012
 
Parallelformers
ParallelformersParallelformers
Parallelformers
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT
 
Services, tools & practices for a software house
Services, tools & practices for a software houseServices, tools & practices for a software house
Services, tools & practices for a software house
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas BhagatOSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
 
How to Choose a Deep Learning Framework
How to Choose a Deep Learning FrameworkHow to Choose a Deep Learning Framework
How to Choose a Deep Learning Framework
 
Cloud Needs Devops
Cloud Needs DevopsCloud Needs Devops
Cloud Needs Devops
 
TDD in Python With Pytest
TDD in Python With PytestTDD in Python With Pytest
TDD in Python With Pytest
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
 

Último

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Último (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Deep Learning Applications (dadada2017)

  • 1. Deep Learning Applications (in industries and elsewhere) Abhishek Thakur @abhi1thakur
  • 2. About me ● Chief Data Scientist @ Boost AI ● Machine learning enthusiast ● Kaggle junkie (highest world rank #3) ● Interested in: ○ Automatic machine learning ○ Large scale classification of text data ○ Chatbots I like big data and I cannot lie
  • 3. Agenda ● Brief introduction to deep learning ● Implementation of deepnets ● Fine-tuning of pre-trained networks ● 4 different industrial use cases ● No maths!!!!
  • 4. What is deep learning?
  • 5. What is deep learning? ● A buzzword
  • 6. What is deep learning? ● A buzzword ● Neural networks
  • 7. What is deep learning? ● A buzzword ● Neural networks ● Removes manual feature extraction steps
  • 8. What is deep learning? ● A buzzword ● Neural networks ● Removes manual feature extraction steps ● Not a black box
  • 9. How have convnets evolved? 1989
  • 10. How have convnets evolved? 2012
  • 11. How have convnets evolved?
  • 12. How have convnets evolved? 2014
  • 13.
  • 14. What can deep learning do?
  • 15. What can deep learning do?
  • 16. What can deep learning do?
  • 17. What can deep learning do?
  • 18. What can deep learning do?
  • 19. What can deep learning do? ● Natural language processing
  • 20. What can deep learning do? ● Natural language processing ● Speech processing
  • 21. What can deep learning do? ● Natural language processing ● Speech processing ● Computer vision ● And more and more
  • 22. How can I implement my own DeepNets?
  • 23. How can I implement my own DeepNets? ● Implement them on your own
  • 24. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts
  • 25. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers
  • 26. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training
  • 27. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training ● Save yourself some time and finetune
  • 28. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training ● Save yourself some time and finetune ○ Convert data
  • 29. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training ● Save yourself some time and finetune ○ Convert data ○ Define net
  • 30. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training ● Save yourself some time and finetune ○ Convert data ○ Define net ○ Define solver
  • 31. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training ● Save yourself some time and finetune ○ Convert data ○ Define net ○ Define solver ○ Train
  • 32. How can I implement my own DeepNets? ● Implement them on your own ○ Decompose into smaller parts ○ Implement layers ○ Start training ● Save yourself some time and finetune ○ Convert data ○ Define net ○ Define solver ○ Train ● Caffe (caffe.berkeleyvision.org) ● Keras (www.keras.io)
  • 33. Caffe
  • 37. Caffe ● Speed ● Openness ● Modularity ● Expression - No coding knowledge? No problem!
  • 38. Caffe ● Speed ● Openness ● Modularity ● Expression - No coding knowledge? No problem! ● Community
  • 39. What do you need for Caffe?
  • 40. What do you need for Caffe? ● Convert data
  • 41. What do you need for Caffe? ● Convert data ● Define a network (prototxt)
  • 42. What do you need for Caffe? ● Convert data ● Define a network (prototxt) ● Define a solver (prototxt)
  • 43. What do you need for Caffe? ● Convert data ● Define a network (prototxt) ● Define a solver (prototxt) ● Train the network (with or without pre-trained weights)
  • 47. Training a net using Caffe
  • 48. Training a net using Caffe /PATH_TO_CAFFE/caffe train --solver=solver.prototxt
  • 49. Fine Tuning! ● Fine tuning using GoogleNet ● Why? ○ It has Google in its name ○ It won ILSVRC 2014 ○ It’s complicated and I wanted to play with it ● Caffe model zoo offers a lot of pretrained nets, including GoogleNet ● Model Zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo
  • 50. Honey Bee vs. Bumble Bee Tougher Than
  • 51. Honey Bee vs. Bumble Bee The Metis Challenge: Naive Bees Classifier @ Drivendata.Org
  • 55. Steps to finetune ● Create training and test files
  • 56. Steps to finetune ● Create training and test files ● Get the prototxt files from model zoo
  • 57. Steps to finetune ● Create training and test files ● Get the prototxt files from model zoo ● Modify them
  • 58. Steps to finetune ● Create training and test files ● Get the prototxt files from model zoo ● Modify them ● Run the caffe solver
  • 59. Generating training and validation sets
  • 60. Generating training and validation sets
  • 68. Finetune your network /PATH_TO_CAFFE/caffe train -solver ./solver.prototxt -weights ./models/bvlc_googlenet.caffemodel
  • 69. Did the net learn something new?
  • 70. Did the net learn something new?
  • 71. Did the net learn something new?
  • 72. Did the net learn something new?
  • 73. Breaking down the various layers of GoogLeNet Random Pretrained Finetuned ● inception_3a ● inception_3b ● inception_4a ● inception_4b ● inception_4c ● inception_4d ● inception_4e ● inception_5a ● inception_5b
  • 75. Why finetune? ● It is faster
  • 76. Why finetune? ● It is faster ● It is better (most of the times)
  • 77. Why finetune? ● It is faster ● It is better (most of the times) ● Why reinvent the wheel?
  • 78. Tell me how to train a deepnet in Python!
  • 79. Tell me how to train a deepnet in Python! ● Caffe has a python interface
  • 80. Tell me how to train a deepnet in Python! ● Caffe has a python interface ● Tensorflow
  • 81. Tell me how to train a deepnet in Python! ● Caffe has a python interface ● Tensorflow ● Theano
  • 82. Tell me how to train a deepnet in Python! ● Caffe has a python interface ● Tensorflow ● Theano ● Lasagne
  • 83. Tell me how to train a deepnet in Python! ● Caffe has a python interface ● Tensorflow ● Theano ● Lasagne ● Keras
  • 84. Tell me how to train a deepnet in Python! ● Caffe has a python interface ● Tensorflow ● Theano ● Lasagne ● Keras ● Neon
  • 85. Tell me how to train a deepnet in Python! ● Caffe has a python interface ● Tensorflow ● Theano ● Lasagne ● Keras ● Neon ● And lots more…..
  • 87. Why classify search queries? ● For businesses ○ Find out user-intent ○ Track keywords according to transactional buying cycle of user ○ Optimize website content and focus on smaller keyword set
  • 88. Why classify search queries? ● For business ○ Find out user-intent ○ Track keywords according to transactional buying cycle of user ○ Optimize website content and focussing on smaller keyword set ● For data scientists ○ 100s of millions of unlabeled keywords to play with ○ Why Not!
  • 92. Feeding Data to LSTMs the white house
  • 93. Feeding Data to LSTMs the white house
  • 94. Feeding Data to LSTMs the white house
  • 95. Feeding Data to LSTMs the white house
  • 96. Feeding Data to LSTMs the white house Sequence for LSTM
  • 97. Feeding Data to LSTMs the white house Sequence for LSTM
  • 98. Feeding Data to LSTMs the white house Sequence for LSTM ❖ United States ❖ President ❖ Politician ❖ Washington ❖ Lawyer ❖ Secretary
  • 101. Representing Queries as Images David Villa Word2Vec representations of the top search result titles Apple juice Irish
  • 102. I don’t see much difference! Guild Wars or Apple Juice
  • 103.
  • 104. Machine Learning Models ● Boosted trees ○ Word2vec embeddings ○ Titles from top results ○ Additional features of the SERP page ○ TF-IDF ○ XGBoost!!!! (https://github.com/dmlc/xgboost)
  • 105. Machine Learning Models ● Convolutional Neural Networks: ○ Using images directly ○ Using random crops from the image
  • 106. Machine Learning Models ● Convolutional Neural Networks: ○ Using images directly ○ Using random crops from the image Convolutional Neural Network
  • 107. Machine Learning Models ● Convolutional Neural Networks: ○ Using images directly ○ Using random crops from the image Convolutional Neural Network Convolutional Neural Network
  • 108. Neural Networks with Keras Convolutional Neural Network https://github.com/fchollet/keras
  • 109. Neural Networks with Keras Convolutional Neural Network
  • 111. Approaching “any” ML problem AutoCompete: A Framework for Machine Learning Competitions, A.Thakur and A Krohn-Grimberghe, ICML AutoML Workshop, 2015
  • 113. Optimizing neural networks AutoML Challenge: Rules for tuning Neural Networks, A.Thakur, ICML AutoML Workshop, System Desc Track, 2016
  • 115. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence)
  • 116. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low:
  • 117. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons
  • 118. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU
  • 119. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20%
  • 120. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20% ● Add new layer:
  • 121. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20% ● Add new layer: ○ 1200-1500 neurons
  • 122. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20% ● Add new layer: ○ 1200-1500 neurons ○ High dropout: 40-50%
  • 123. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20% ● Add new layer: ○ 1200-1500 neurons ○ High dropout: 40-50% ● Very big network:
  • 124. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20% ● Add new layer: ○ 1200-1500 neurons ○ High dropout: 40-50% ● Very big network: ○ 8000-10000 neurons in each layer
  • 125. Selecting NNet Architecture ● Always use SGD or Adam (for fast convergence) ● Start low: ○ Single layer with 120-500 neurons ○ Batch normalization + ReLU ○ Dropout: 10-20% ● Add new layer: ○ 1200-1500 neurons ○ High dropout: 40-50% ● Very big network: ○ 8000-10000 neurons in each layer ○ 60-80% dropout
  • 129. Some Results AutoML GPU Track Results
  • 130. @abhi1thakur 10 Things You Didn’t Know About Clickbaits!
  • 131. What are clickbaits? ● 10 things Apple didn’t tell you about the new iPhone ● What happened next will surprise you ● This is what the actor/actress from 90s looks like now ● What did Donald Trump just say about Obama and Clinton ● 9 things you must have to be a good data scientist @abhi1thakur
  • 133. What are clickbaits? ● Interesting titles ● Frustrating titles ● Seldomly good enough content ● Google penalizes clickbait content ● Facebook does the same @abhi1thakur
  • 134. The data ● Crawl buzzfeed, clickhole ● Crawl new york times, cnn ● ~10000 titles ○ Clickbaits: buzzfeed, clickhole ○ Non-clickbaits: new york times, cnn ○ ~5000 from either categories @abhi1thakur
  • 135. Good old TF-IDF ● Very powerful ● Used both character and word analyzers @abhi1thakur
  • 138. Let’s build some models @abhi1thakur
  • 139. Logistic Regression @abhi1thakur ● ROC AUC Score = 0.987319021551 ● Precision Score = 0.950326797386 ● Recall Score = 0.939276485788 ● F1 Score = 0.944769330734
  • 140. XGBoost @abhi1thakur ● ROC AUC Score = 0.969700677962 ● Precision Score = 0.95756718529 ● Recall Score = 0.874677002584 ● F1 Score = 0.914247130317
  • 141. Is that it? ● No! ● Model predictions: ○ “Donald Trump” : 15% Clickbait ○ “Barack Obama”: 80% Clickbait ● Something was very wrong! ● TF-IDF didn’t capture the meanings @abhi1thakur
  • 142. Word2Vec ● Shallow neural networks ● Generates vectors of high dimension for every word ● Every word gets a position in space ● Similar words cluster together @abhi1thakur
  • 144. XGBoost + W2V @abhi1thakur ● ROC AUC Score = 0.981312768055 ● Precision Score = 0.939947780679 ● Recall Score = 0.93023255814 ● F1 Score = 0.935064935065
  • 145. Performance ● Fast to train ● Good results @abhi1thakur
  • 147. Does word2vec capture everything? Do we have all we need only from titles? What if content of website isn’t clickbait-y? @abhi1thakur
  • 148. The data ● Crawl Buzzfeed, NYT, CNN, clickhole, etc. ● Too much work ● Simple models ● Doubts about results ● Crawl public Facebook pages: ○ Buzzfeed ○ CNN ○ The New York Times ○ Clickhole ○ StopClickBaitOfficial ○ Upworthy ○ Wikinews Facebook page scrapper is available here: https://github.com/minimaxir/facebook-page-post-scraper @abhi1thakur
  • 149. The data ● link_name (the title of the URL shared) ● status_type (whether it’s a link, photo or a video) ● status_link (the actual URL) @abhi1thakur
  • 151. Data Processing ● Get the HTML content too ● Clean the mess up! @abhi1thakur
  • 152. Feature Generation ● Size of the HTML (in bytes) ● Length of HTML ● Total number of links ● Total number of buttons ● Total number of inputs ● Total number of unordered lists ● Total number of ordered lists ● Total number of lists (ordered + unordered) @abhi1thakur ● Total Number of H1 tags ● Total Number of H2 tags ● Full length of all text in all H1 tags that were found ● Full length of all text in all H2 tags that were found ● Total number of images ● Total number of html tags ● Number of unique html tags
  • 153. More Features ● All H1 text ● All H2 text ● Meta description @abhi1thakur
  • 161. Deep Learning Models ● Simple LSTM ● Two dense layers ● Dropout + Batch Normalization ● Softmax Activation @abhi1thakur
  • 166. Detecting Duplicates in Quora Questions
  • 167. The Problem ➢ ~ 13 million questions (as of March, 2017) ➢ Many duplicate questions ➢ Cluster and join duplicates together ➢ Remove clutter ➢ First public data release: 24th January, 2017
  • 168. Duplicate Questions ➢ How does Quora quickly mark questions as needing improvement? ➢ Why does Quora mark my questions as needing improvement/clarification before I have time to give it details? Literally within seconds… ➢ What practical applications might evolve from the discovery of the Higgs Boson? ➢ What are some practical benefits of discovery of the Higgs Boson? ➢ Why did Trump win the Presidency? ➢ How did Donald Trump win the 2016 Presidential Election?
  • 169. Non-Duplicate Questions ➢ Who should I address my cover letter to if I'm applying for a big company like Mozilla? ➢ Which car is better from safety view?""swift or grand i10"".My first priority is safety? ➢ Mr. Robot (TV series): Is Mr. Robot a good representation of real-life hacking and hacking culture? Is the depiction of hacker societies realistic? ➢ What mistakes are made when depicting hacking in ""Mr. Robot"" compared to real-life cybersecurity breaches or just a regular use of technologies? ➢ How can I start an online shopping (e-commerce) website? ➢ Which web technology is best suitable for building a big E-Commerce website?
  • 170. The Data ➢ 400,000+ pairs of questions ➢ Initially data was very skewed ➢ Negative samples from related questions ➢ Not real distribution on Quora’s website ➢ Noise exists (as usual) https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
  • 171. The Data ➢ 255045 negative samples (non-duplicates) ➢ 149306 positive samples (duplicates) ➢ 40% positive samples
  • 172. The Data ➢ Average number characters in question1: 59.57 ➢ Minimum number of characters in question1: 1 ➢ Maximum number of characters in question1: 623 ➢ Average number characters in question2: 60.14 ➢ Minimum number of characters in question2: 1 ➢ Maximum number of characters in question2: 1169
  • 173. Basic Feature Engineering ➢ Length of question1 ➢ Length of question2 ➢ Difference in the two lengths ➢ Character length of question1 without spaces ➢ Character length of question2 without spaces ➢ Number of words in question1 ➢ Number of words in question2 ➢ Number of common words in question1 and question2
  • 174. Basic Feature Engineering ➢ Basic feature set: fs-1 data['len_q1'] = data.question1.apply(lambda x: len(str(x))) data['len_q2'] = data.question2.apply(lambda x: len(str(x))) data['diff_len'] = data.len_q1 - data.len_q2 data['len_char_q1'] = data.question1.apply(lambda x: len(''.join(set(str(x).replace(' ', ''))))) data['len_char_q2'] = data.question2.apply(lambda x: len(''.join(set(str(x).replace(' ', ''))))) data['len_word_q1'] = data.question1.apply(lambda x: len(str(x).split())) data['len_word_q2'] = data.question2.apply(lambda x: len(str(x).split())) data['common_words'] = data.apply(lambda x: len(set(str(x['question1']).lower().split()).intersection(set(str(x['question2']).lower().split()))), axis=1)
  • 175. Fuzzy Features ➢ pip install fuzzywuzzy ➢ Uses Levenshtein distance ➢ QRatio ➢ WRatio ➢ Token set ratio ➢ Token sort ratio ➢ Partial token set ratio ➢ Partial token sort ratio ➢ etc. etc. etc. https://github.com/seatgeek/fuzzywuzzy
  • 176. Fuzzy Features ➢ Fuzzy feature set: fs-2 data['fuzz_qratio'] = data.apply(lambda x: fuzz.QRatio(str(x['question1']), str(x['question2'])), axis=1) data['fuzz_WRatio'] = data.apply(lambda x: fuzz.WRatio(str(x['question1']), str(x['question2'])), axis=1) data['fuzz_partial_ratio'] = data.apply(lambda x: fuzz.partial_ratio(str(x['question1']), str(x['question2'])), axis=1) data['fuzz_partial_token_set_ratio'] = data.apply(lambda x: fuzz.partial_token_set_ratio(str(x['question1']), str(x['question2'])), axis=1) data['fuzz_partial_token_sort_ratio'] = data.apply(lambda x: fuzz.partial_token_sort_ratio(str(x['question1']), str(x['question2'])), axis=1) data['fuzz_token_set_ratio'] = data.apply(lambda x: fuzz.token_set_ratio(str(x['question1']), str(x['question2'])), axis=1) data['fuzz_token_sort_ratio'] = data.apply(lambda x: fuzz.token_sort_ratio(str(x['question1']), str(x['question2'])), axis=1)
  • 177. TF-IDF ➢ TF(t) = Number of times a term t appears in a document / Total number of terms in the document ➢ IDF(t) = log(Total number of documents / Number of documents with term t in it) ➢ TF-IDF(t) = TF(t) * IDF(t) tfidf = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode', analyzer='word',token_pattern=r'w{1,}', ngram_range=(1, 2), use_idf=1, smooth_idf=1, sublinear_tf=1, stop_words = 'english')
  • 178. SVD ➢ Latent semantic analysis ➢ scikit-learn version of SVD ➢ 120 components svd = decomposition.TruncatedSVD(n_components=120) xtrain_svd = svd.fit_transform(xtrain) xtest_svd = svd.transform(xtest)
  • 179. Fuzzy Features ➢ Also known as approximate string matching ➢ Number of “primitive” operations required to convert string to exact match ➢ Primitive operations: ○ Insertion ○ Deletion ○ Substitution ➢ Typically used for: ○ Spell checking ○ Plagiarism detection ○ DNA sequence matching ○ Spam filtering
  • 180. A Combination of TF-IDF & SVD ➢ TF-IDF features: fs3-1
  • 181. A Combination of TF-IDF & SVD ➢ TF-IDF features: fs3-2
  • 182. A Combination of TF-IDF & SVD ➢ TF-IDF + SVD features: fs3-3
  • 183. A Combination of TF-IDF & SVD ➢ TF-IDF + SVD features: fs3-4
  • 184. A Combination of TF-IDF & SVD ➢ TF-IDF + SVD features: fs3-5
  • 185. Word2Vec Features ➢ Multi-dimensional vector for all the words in any dictionary ➢ Always great insights ➢ Very popular in natural language processing tasks ➢ Google news vectors 300d
  • 186. Word2Vec Features ➢ Representing words ➢ Representing sentences def sent2vec(s): words = str(s).lower().decode('utf-8') words = word_tokenize(words) words = [w for w in words if not w in stop_words] words = [w for w in words if w.isalpha()] M = [] for w in words: M.append(model[w]) M = np.array(M) v = M.sum(axis=0) return v / np.sqrt((v ** 2).sum())
  • 187. W2V Features: WMD Kusner, M., Sun, Y., Kolkin, N. & Weinberger, K.. (2015). From Word Embeddings To Document Distances.
  • 188. W2V Features: Skew ➢ Skew = 0 for normal distribution ➢ Skew > 0: more weight in left tail
  • 189. W2V Features: Kurtosis ➢ 4th central moment over the square of variance ➢ Types: ○ Pearson ○ Fisher: subtract 3.0 from result such that result is 0 for normal distribution
  • 190. W2V Features ➢ Word2Vec feature set: fs-4 scipy.spatial.distance scipy.stats minkowski jaccard manhattanbraycurtis euclidean cosine canberra kurtosisskew
  • 195. Machine Learning Models ➢ Logistic regression ➢ Xgboost ➢ 5 fold cross-validation ➢ Accuracy as a comparison metric (also, precision + recall) ➢ Why accuracy?
  • 198. LSTM ➢ Long short term memory ➢ A type of RNN ➢ Learn long term dependencies ➢ Used two LSTM layers
  • 199. 1D CNN ➢ One dimensional convolutional layer ➢ Temporal convolution ➢ Simple to implement: for i in range(sample_length): y[i] = 0 for j in range(kernel_length): y[i] += x[i-j] * h[j]
  • 200. Embedding Layers ➢ Simple layer ➢ Converts indexes to vectors ➢ [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
  • 201. Time Distributed Dense Layer ➢ TimeDistributed wrapper around dense layer ➢ TimeDistributed applies the layer to every temporal slice of input ➢ Followed by Lambda layer ➢ Implements “translation” layer used by Stephen Merity (keras snli model) model1 = Sequential() model1.add(Embedding(len(word_index) + 1, 300, weights=[embedding_matrix], input_length=40, trainable=False)) model1.add(TimeDistributed(Dense(300, activation='relu'))) model1.add(Lambda(lambda x: K.sum(x, axis=1), output_shape=(300,)))
  • 202. GloVe Embeddings ➢ Count based model ➢ Dimensionality reduction on co-occurrence counts matrix ➢ word-context matrix -> word-feature matrix ➢ Common Crawl ○ 840B tokens, 2.2M vocab, 300d vectors Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation
  • 203. Basis of Deep Learning Model ➢ Keras-snli model: https://github.com/Smerity/keras_snli
  • 204. Before Training DeepNets ➢ Tokenize data ➢ Convert text data to sequences tk = text.Tokenizer(nb_words=200000) max_len = 40 tk.fit_on_texts(list(data.question1.values) + list(data.question2.values.astype(str))) x1 = tk.texts_to_sequences(data.question1.values) x1 = sequence.pad_sequences(x1, maxlen=max_len) x2 = tk.texts_to_sequences(data.question2.values.astype(str)) x2 = sequence.pad_sequences(x2, maxlen=max_len) word_index = tk.word_index
  • 205. Before Training DeepNets ➢ Initialize GloVe embeddings embeddings_index = {} f = open('data/glove.840B.300d.txt') for line in tqdm(f): values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs f.close()
  • 206. Before Training DeepNets ➢ Create the embedding matrix embedding_matrix = np.zeros((len(word_index) + 1, 300)) for word, i in tqdm(word_index.items()): embedding_vector = embeddings_index.get(word) if embedding_vector is not None: embedding_matrix[i] = embedding_vector
  • 208.
  • 210. Model 1 and Model 2 model1 = Sequential() model1.add(Embedding(len(word_index) + 1, 300, weights=[embedding_matrix], input_length=40, trainable=False)) model1.add(TimeDistributed(Dense(300, activation='relu'))) model1.add(Lambda(lambda x: K.sum(x, axis=1), output_shape=(300,))) model2 = Sequential() model2.add(Embedding(len(word_index) + 1, 300, weights=[embedding_matrix], input_length=40, trainable=False)) model2.add(TimeDistributed(Dense(300, activation='relu'))) model2.add(Lambda(lambda x: K.sum(x, axis=1), output_shape=(300,)))
  • 212. Model 3 and Model 4
  • 213. Model 3 and Model 4 model3 = Sequential() model3.add(Embedding(len(word_index) + 1, 300, weights=[embedding_matrix], input_length=40, trainable=False)) model3.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1)) model3.add(Dropout(0.2)) . . . model3.add(Dense(300)) model3.add(Dropout(0.2)) model3.add(BatchNormalization())
  • 215. Model 5 and Model 6 model5 = Sequential() model5.add(Embedding(len(word_index) + 1, 300, input_length=40, dropout=0.2)) model5.add(LSTM(300, dropout_W=0.2, dropout_U=0.2)) model6 = Sequential() model6.add(Embedding(len(word_index) + 1, 300, input_length=40, dropout=0.2)) model6.add(LSTM(300, dropout_W=0.2, dropout_U=0.2))
  • 218. Time to Train the DeepNet ➢ Total params: 174,913,917 ➢ Trainable params: 60,172,917 ➢ Non-trainable params: 114,741,000 ➢ NVIDIA Titan X
  • 219.
  • 220. Combined Results The deep network was trained on an NVIDIA TitanX and took approximately 300 seconds for each epoch and took 10-15 hours to train. This network achieved an accuracy of 0.848 (~0.85).
  • 221. Improving Further ➢ Cleaning the text data, e.g correcting mis-spellings ➢ POS tagging ➢ Entity recognition ➢ Combining deepnet with traditional ML models
  • 222. Conclusion & References ➢ The deepnet gives near state-of-the-art result ➢ BiMPM model accuracy: 88% Some reference: ➢ Zhiguo Wang, Wael Hamza and Radu Florian. "Bilateral Multi-Perspective Matching for Natural Language Sentences," (BiMPM) ➢ Matthew Honnibal. "Deep text-pair classification with Quora's 2017 question dataset," 13 February 2017. Retreived at https://explosion.ai/blog/quora-deep-text-pair-classification ➢ Bradley Pallen’s work: https://github.com/bradleypallen/keras-quora-question-pairs
  • 223.
  • 224. Natural Language Processing Pre-trained domain knowledge Classification of intent Identify entities (extracting information) API Analytics Delegation to customer support Delegation to back-end robots INSTANT PROCESSING and END-TO-END AUTOMATION Monitoring and AI training Chat Avatar Text (Speech)
  • 225. Pre-defined replyEnquiry Intent classificationPre-processing of enquiry Stemming Cross-language Misspellings algorithm 1. Insurance 2. Vehicle 3. Car 4.Rules for practice driving Conversation without API You don’t need to adjust your car insurance when practise driving with a learner’s permit. In case of damage it’s the supervisor with a full driver’s license that shall write and sign the insurance claim Hey you, do you knoww if my car insruacne covers practice driving??
  • 226. Hi James, what’s the weather in Berlin on Thursday? Thursday’s forecast for Berlin is partly sunny and mostly clouds. Required value - Location Optional value - Date Conversation with API Redirect to API - Weather
  • 227.
  • 228. Thank you! Questions / Comments? All The Code: ❖ github.com/abhishekkrthakur Get in touch: ➢ E-mail: abhishek4@gmail.com ➢ LinkedIn: bit.ly/thakurabhishek ➢ Kaggle: kaggle.com/abhishek ➢ Twitter: @abhi1thakur If everything fails, use Xgboost