SlideShare a Scribd company logo
1 of 31
Download to read offline
Quora Question Pairs
By: Shradha Sunil
Peddamail Jayavardhan Reddy
Dataset
DataSet:
The datasets needed for this project have been obtained from the Kaggle site.
The files are:
1. Train Set:
a. Size - 21.17 MB
b. Number of Question Pairs: 404290
2. Test Set:
a. Size -112.47 MB
b. Number of Question Pairs: 2345796
DISTRIBUTION OF TRAINING SET
Word Embeddings:
1. Word2Vec:
○ Dimension : 300d
○ Number of Words: 3 million words
○ Training Set: 100 billion words from Google News dataset
2. Glove Vector Representation:
○ Dimension: 50d,100d, 200d, 300d
○ Number of words: 400k words
○ TrainSet: 6 billion tokens from Wikipedia 2014 + Gigaword 5
○ We used 100d Glove vectors representation.
API’s used:
1. Keras
2. Tensorflow
3. Gensim
4. Sklearn
5. Plotly,Seaborn - Visualizations
Implementation of simpler models
The features used are
1. Tf-idf was used as a pre-processing technique.
2. The following models were implemented using the sklearn library
1. Random forest
2. Logistic regression
3. Naive bayes
4. Decision tree
5. SVM
Results
Model Training
accuracy
Validation
accuracy
Training loss Validation
loss
Random
forest
0.775 0.721 0.685315 0.697242
Logistic
regression
0.685 0.669 0.72543 0.741672
Decision tree 0.672 0.695 0.75232 0.74327
Naïve bayes 0.574 0.633 0.8472 0.8253
SVM 0.429 0.553 0.8723 0.8244
Neural Network Architectures
1. Convolutional Neural Network ( CNN )
2. Long Short Term Memory Network ( LSTM)
3. Bidirectional LSTM
Simple Siamese Architecture:
Neural Network
Input 2
Output 2
Neural Network
Input 1
Output 1
Similarity Measure
Weights
Shared
Output
Siamese Architecture Using CNN:
Siamese Architecture with CNN with GloVe Representation:
Embedding Layer (First Layer): This layer is the first layer in Deep NLP Architectures. This layer maps a given input and
converts it into its corresponding Word Embedding. The embedding layer is then fed into the convolution layer.
Conv1D layer: Convolution1d performs 1d convolution on the word embeddings that are obtained as the output of the
embedding layer. When the convolutional layer is implemented, the parameters that need to be included are the number of
filters, the kernel size, the stride length and the shape of the input.
Max-pooling layer: Max pooling is a sample-based discretization process. The objective is to down-sample an input
representation, reducing its dimensionality and allowing for assumptions to be made about features contained in the
sub-regions.
Dropout: Dropout is carried out to prevent overfitting.
Flatten: The flatten large is used to flatten the output of the CNN layer.
Merge Layer: Merge Large is used to combine different vector outputs from the CNN. Both CNN’s will produce a sentence
vector, merge layer combined them to produce a single vector output.
Dense Layer: It is a regularly connected dense layer. It converts the vector output from the Merge Layer into a number between
0 and 1, which shows the measure of the similarity predicted by the network.
Parameters within the CNN:
Model Number Of layers Dropout Validation Loss Training Loss
CNN 2 0.5 0.54606734 0.544829
CNN 3 0.4 0.6147397 0.61473978
Observations of several CNN models with varying dropout and number of layers
after training on 20 epochs:
It is observed that unlike how more layers lead to more abstraction in images, in text it is found
that two layers lead to the best results.
This is because most of the features are learnt in the first layer itself so any activation in the
first layer gets passed through and nothing new is learnt when adding deeper layers.
Siamese Architecture Using LSTM:
LSTM
Embedding Layer
LSTM
Embedding Layer
Dense
Weights
Shared
Merge
Question 1 Question 2
Siamese Architecture with LSTM with Word2Vec 300d Representation:
Siamese Architecture with LSTM with Glove 100d Representation:
Comparison Word2vec vs Glove:
Word Representation Model Validation Log-loss
( 20 epochs)
Time-Taken
Word2Vec Siamese LSTM 0.423 80 mins( per epoch)
Glove Siamese LSTM 0.434 25 mins(per epoch)
Observation:
1. Comparable Log-loss
2. Glove considerably faster compared to Word2vec
● Used Glove Vectors 100 dimension for rest of the analysis( Lesser dimensions,
lesser complexity, faster implementation)
Bidirectional LSTM
Word 1 Word 2 Word 3
LSTM LSTMLSTM
LSTMLSTMLSTM
Concat
Aggregate
ConcatConcat
Embeddings
LSTM
Word 1 Word 2 Word 3
LSTM LSTMLSTM
Aggregate
Embeddings
Comparison between Bi-LSTM and LSTM:
Model Drop-out Validation loss Time-Taken
Bidirectional LSTM Recurrent_Dropout=0.2,Dropout=0.
2
0.44385 90 mins
LSTM Recurrent_Dropout=0.2,Dropout=0.
2
0.43454 25 mins
Observations:
1. Bi-LSTM performs better than LSTM for most tasks.
2. But Overfitting Problem. Bi-LSTM stopped in 7 Epochs. Validation loss started to Increase after 5
epochs.
3. Implemented using Keras Model Checkpoint and Early Stopping.
4. LSTM model also suffered from overfitting ( Stopped in 12 Epochs)
Solution:Tuning Recurrent_Dropout and Dropout for Bi-LSTM ( Did not Implement because of Time
Constraints)
NOTE: We finalized on LSTM + Glove Vectors for Further Analysis, due to comparative performance
and faster implementation.
Model Drop-out Validation loss Training Loss
LSTM Recurrent_Dropout=0.2,Dropout=0.2 0.4345 0.4301
LSTM Recurrent_Dropout=0.5,Dropout=0.5 0.4532 0.4476
LSTM Recurrent_Dropout=0.0,Dropout=0.0 0.4632 0.4375
LSTM Varied Dropout:
Observations:
1. LSTM with dropout of 0.5,0.5 suffers from reduced performance on both train and validation
set.
2. LSTM without dropout was also implemented, as expected it suffers from overfitting.
3. We could have also tried randomized dropout( not implemented due to time complexity)
Further Steps to Improve Log-loss:
1) Preprocessing Text
a) Replace Numbers by ‘n’
b) Handle Special Words and Abbreviations( I’d)
c) Stop -Word Removal
2) Handling Un-balanced Classes
3) Question Symmetry
Preprocessing Text:
● Preprocessing input text is an important step for any NLP task. Why?
● Different pre-processing steps analyzed:
Preprocessing Step Original Text Modified Text
Number Replacement I have 100 apples. I have n Apples
Special Words I’d run a marathon. I would run a marathon.
Stop word removal I have 100 apples. I 100 apples
Importance of Text Preprocessing:
Number Replacement:
● Wordvec, glove vectors don’t recognize numbers
● Numbers have to be replaced by n
Special Words:
● Words like I’d , I’m cause problems when tokenized
○ I’d - I + d
○ I’m - I + m
● The meaning is lost.
● We need to handle these word separately.
○ I’d - I would
○ I’m - I am
Stop Words Removal:
● Stop words increase complexity of system, they might or might not provide
any additional information
Handling Imbalance Classes:
● Training Data and Testing Data might have different distribution of positive and negative
examples.
● If training data has considerably more positive examples, the model tends towards positive
while prediction and vice versa.
● In our data,
○ Train Set: 36.92% positive example
○ Test set: 17.46% positive examples
● We need map the share of positive examples to be the same in Test/Train.
● The weight of one positive example in the train set counts for 0.472 ( 0.1746 / 0.3692)
positive entities in the test set.
● Similarly, the weight of negative example in the train set is (1 - 0.1746) / (1 - 0.3692) = 1.309 .
● Log loss function = -( 0.472001959 * t * logy + 1.309028344 * (1.0 - t) * log(1.0 - y) )
Where t: target value , y : predicted value
● Class Weighting can be provided as parameter in ‘Keras’.
Note:
● This is a work around to achieve better Performance on Test set. As in this case, the distribution is
totally skewed in Test Set.
● Ideally, We need to split the Train, Test and Validation set, maintaining class balance.
Question Pair Symmetry:
● Tried to interchange Q1 and Q2 to see if effects the Model.
● Sometimes model might learn features which are related to the question order.
● We interchanged half of Q1 with Q2 and trained the model.
Analysis: Different Preprocessing Steps and their effect on log-loss
Model parameters:
● Length of Each Question- 30
● Recurrent Dropout - 0.2
● Dropout - 0.2
● Length of LSTM Layer -100
● Train : Validation - 9:1
Model Comparison:
Steps Train
Loss
Validatio
n loss
Public Test
loss
Private Test
Loss
Symmetry-No , Simple Text Preprocessing -Yes , Class
Weighting - No
0.4034 0.4056 0.4041 0.4065
Symmetry-No , Simple Text Preprocessing -Yes , Class
Weighting - Yes
0.3045 0.2931 0.3104 0.3144
Symmetry-Yes
Simple Text Preprocessing -Yes
Class Weighting - Yes
0.3033 0.2925 0.3079 0.3109
Symmetry-Yes , Simple Text Preprocessing -Yes , Class
Weighting - Yes, Stop word Removal - Yes
0.3512 0.3567 0.3626 0.3617
Observations:
● Underfitting for some models - Restricted to 20 epochs due to time-constraint( The log-loss was still
decreasing)
● Stop words Removal decreased the performance. Stop words had contextual information which was
important.
Note: Simple Text Preprocessing - Numbers and Special Words
Further Steps which can Explore:
● Parameter tuning especially dropout can be tuned to attain better log-loss.
● Analysis using increased number of epochs.
● Ensemble of Models
○ Bagging - Avoids Overfitting
○ Boosting - Helps improve weak models( Under fitting)
● Observed lot of Nouns in the data, Parts-of-Speech features
○ Replacing Proper Noun by Noun
● More text Processing: Lemmatization, Stemming
● Character Level LSTM’s
● Word - Attention Model’s
References:
1. Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A
Generative Approach to Sentiment Analysis - http://www.aclweb.org/anthology/E17-1096
2. A Hierarchical Neural Autoencoder for Paragraphs and Documents - https://arxiv.org/pdf/1506.01057v2.pdf
3. Siamese Recurrent Architectures for Learning Sentence Similarity -
http://www.mit.edu/~jonasm/info/MuellerThyagarajan_AAAI16.pdf
4. Efficient Estimation of Word Representations in Vector Space - https://arxiv.org/pdf/1301.3781.pdf
5. GloVe: Global Vectors for Word Representation - https://nlp.stanford.edu/pubs/glove.pdf
6. HDLTex: Hierarchical Deep Learning for Text Classification - https://arxiv.org/pdf/1709.08267.pdf
7. Signature Verification using a "Siamese" Time Delay Neural Network -
https://papers.nips.cc/paper/769-signature-verification-using-a-siamese-time-delay-neural-network.pdf
8. Dropout: A Simple Way to Prevent Neural Networks from Overfitting -
http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
9. Understanding CNNs for NLP -
http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
10. Comparison study of CNNs and RNNs for NLP - https://arxiv.org/pdf/1702.01923.pdf
11. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
12. Term Frequency Inverse Document Frequency - http://www.tfidf.com/
Thankyou
Any Questions?

More Related Content

What's hot

Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Mumford-Shah Loss Functional for Image Segmentation With Deep Learning
Mumford-Shah Loss Functional for Image Segmentation With Deep LearningMumford-Shah Loss Functional for Image Segmentation With Deep Learning
Mumford-Shah Loss Functional for Image Segmentation With Deep LearningBoahKim2
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithmAndrew Koo
 
은닉 마르코프 모델, Hidden Markov Model(HMM)
은닉 마르코프 모델, Hidden Markov Model(HMM)은닉 마르코프 모델, Hidden Markov Model(HMM)
은닉 마르코프 모델, Hidden Markov Model(HMM)찬희 이
 
Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathLê Hòa
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimensionbutest
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief netsbutest
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 

What's hot (20)

Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Mumford-Shah Loss Functional for Image Segmentation With Deep Learning
Mumford-Shah Loss Functional for Image Segmentation With Deep LearningMumford-Shah Loss Functional for Image Segmentation With Deep Learning
Mumford-Shah Loss Functional for Image Segmentation With Deep Learning
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Bert
BertBert
Bert
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithm
 
은닉 마르코프 모델, Hidden Markov Model(HMM)
은닉 마르코프 모델, Hidden Markov Model(HMM)은닉 마르코프 모델, Hidden Markov Model(HMM)
은닉 마르코프 모델, Hidden Markov Model(HMM)
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable Path
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimension
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief nets
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 

Similar to Duplicate_Quora_Question_Detection

Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
Text cnn on acme ugc moderation
Text cnn on acme ugc moderationText cnn on acme ugc moderation
Text cnn on acme ugc moderationMarsan Ma
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition Pruthvij Thakar
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...taeseon ryu
 
Optimization in Programming languages
Optimization in Programming languagesOptimization in Programming languages
Optimization in Programming languagesAnkit Pandey
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork
 
Important Concepts for Machine Learning
Important Concepts for Machine LearningImportant Concepts for Machine Learning
Important Concepts for Machine LearningSolivarLabs
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
CODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtryCODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtrykapib57390
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 

Similar to Duplicate_Quora_Question_Detection (20)

Matopt
MatoptMatopt
Matopt
 
C3 w2
C3 w2C3 w2
C3 w2
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Guide
GuideGuide
Guide
 
Text cnn on acme ugc moderation
Text cnn on acme ugc moderationText cnn on acme ugc moderation
Text cnn on acme ugc moderation
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
 
Optimization in Programming languages
Optimization in Programming languagesOptimization in Programming languages
Optimization in Programming languages
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Important Concepts for Machine Learning
Important Concepts for Machine LearningImportant Concepts for Machine Learning
Important Concepts for Machine Learning
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Study_of_Sequence_labeling_Systems
Study_of_Sequence_labeling_SystemsStudy_of_Sequence_labeling_Systems
Study_of_Sequence_labeling_Systems
 
eam2
eam2eam2
eam2
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
CODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtryCODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtry
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 

Recently uploaded

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Recently uploaded (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Duplicate_Quora_Question_Detection

  • 1. Quora Question Pairs By: Shradha Sunil Peddamail Jayavardhan Reddy
  • 2. Dataset DataSet: The datasets needed for this project have been obtained from the Kaggle site. The files are: 1. Train Set: a. Size - 21.17 MB b. Number of Question Pairs: 404290 2. Test Set: a. Size -112.47 MB b. Number of Question Pairs: 2345796
  • 4.
  • 5. Word Embeddings: 1. Word2Vec: ○ Dimension : 300d ○ Number of Words: 3 million words ○ Training Set: 100 billion words from Google News dataset 2. Glove Vector Representation: ○ Dimension: 50d,100d, 200d, 300d ○ Number of words: 400k words ○ TrainSet: 6 billion tokens from Wikipedia 2014 + Gigaword 5 ○ We used 100d Glove vectors representation. API’s used: 1. Keras 2. Tensorflow 3. Gensim 4. Sklearn 5. Plotly,Seaborn - Visualizations
  • 6.
  • 7. Implementation of simpler models The features used are 1. Tf-idf was used as a pre-processing technique. 2. The following models were implemented using the sklearn library 1. Random forest 2. Logistic regression 3. Naive bayes 4. Decision tree 5. SVM
  • 8. Results Model Training accuracy Validation accuracy Training loss Validation loss Random forest 0.775 0.721 0.685315 0.697242 Logistic regression 0.685 0.669 0.72543 0.741672 Decision tree 0.672 0.695 0.75232 0.74327 Naïve bayes 0.574 0.633 0.8472 0.8253 SVM 0.429 0.553 0.8723 0.8244
  • 9. Neural Network Architectures 1. Convolutional Neural Network ( CNN ) 2. Long Short Term Memory Network ( LSTM) 3. Bidirectional LSTM
  • 10. Simple Siamese Architecture: Neural Network Input 2 Output 2 Neural Network Input 1 Output 1 Similarity Measure Weights Shared Output
  • 12. Siamese Architecture with CNN with GloVe Representation:
  • 13. Embedding Layer (First Layer): This layer is the first layer in Deep NLP Architectures. This layer maps a given input and converts it into its corresponding Word Embedding. The embedding layer is then fed into the convolution layer. Conv1D layer: Convolution1d performs 1d convolution on the word embeddings that are obtained as the output of the embedding layer. When the convolutional layer is implemented, the parameters that need to be included are the number of filters, the kernel size, the stride length and the shape of the input. Max-pooling layer: Max pooling is a sample-based discretization process. The objective is to down-sample an input representation, reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions. Dropout: Dropout is carried out to prevent overfitting. Flatten: The flatten large is used to flatten the output of the CNN layer. Merge Layer: Merge Large is used to combine different vector outputs from the CNN. Both CNN’s will produce a sentence vector, merge layer combined them to produce a single vector output. Dense Layer: It is a regularly connected dense layer. It converts the vector output from the Merge Layer into a number between 0 and 1, which shows the measure of the similarity predicted by the network. Parameters within the CNN:
  • 14. Model Number Of layers Dropout Validation Loss Training Loss CNN 2 0.5 0.54606734 0.544829 CNN 3 0.4 0.6147397 0.61473978 Observations of several CNN models with varying dropout and number of layers after training on 20 epochs: It is observed that unlike how more layers lead to more abstraction in images, in text it is found that two layers lead to the best results. This is because most of the features are learnt in the first layer itself so any activation in the first layer gets passed through and nothing new is learnt when adding deeper layers.
  • 15. Siamese Architecture Using LSTM: LSTM Embedding Layer LSTM Embedding Layer Dense Weights Shared Merge Question 1 Question 2
  • 16. Siamese Architecture with LSTM with Word2Vec 300d Representation:
  • 17. Siamese Architecture with LSTM with Glove 100d Representation:
  • 18. Comparison Word2vec vs Glove: Word Representation Model Validation Log-loss ( 20 epochs) Time-Taken Word2Vec Siamese LSTM 0.423 80 mins( per epoch) Glove Siamese LSTM 0.434 25 mins(per epoch) Observation: 1. Comparable Log-loss 2. Glove considerably faster compared to Word2vec ● Used Glove Vectors 100 dimension for rest of the analysis( Lesser dimensions, lesser complexity, faster implementation)
  • 19. Bidirectional LSTM Word 1 Word 2 Word 3 LSTM LSTMLSTM LSTMLSTMLSTM Concat Aggregate ConcatConcat Embeddings LSTM Word 1 Word 2 Word 3 LSTM LSTMLSTM Aggregate Embeddings
  • 20. Comparison between Bi-LSTM and LSTM: Model Drop-out Validation loss Time-Taken Bidirectional LSTM Recurrent_Dropout=0.2,Dropout=0. 2 0.44385 90 mins LSTM Recurrent_Dropout=0.2,Dropout=0. 2 0.43454 25 mins Observations: 1. Bi-LSTM performs better than LSTM for most tasks. 2. But Overfitting Problem. Bi-LSTM stopped in 7 Epochs. Validation loss started to Increase after 5 epochs. 3. Implemented using Keras Model Checkpoint and Early Stopping. 4. LSTM model also suffered from overfitting ( Stopped in 12 Epochs) Solution:Tuning Recurrent_Dropout and Dropout for Bi-LSTM ( Did not Implement because of Time Constraints) NOTE: We finalized on LSTM + Glove Vectors for Further Analysis, due to comparative performance and faster implementation.
  • 21. Model Drop-out Validation loss Training Loss LSTM Recurrent_Dropout=0.2,Dropout=0.2 0.4345 0.4301 LSTM Recurrent_Dropout=0.5,Dropout=0.5 0.4532 0.4476 LSTM Recurrent_Dropout=0.0,Dropout=0.0 0.4632 0.4375 LSTM Varied Dropout: Observations: 1. LSTM with dropout of 0.5,0.5 suffers from reduced performance on both train and validation set. 2. LSTM without dropout was also implemented, as expected it suffers from overfitting. 3. We could have also tried randomized dropout( not implemented due to time complexity)
  • 22. Further Steps to Improve Log-loss: 1) Preprocessing Text a) Replace Numbers by ‘n’ b) Handle Special Words and Abbreviations( I’d) c) Stop -Word Removal 2) Handling Un-balanced Classes 3) Question Symmetry
  • 23. Preprocessing Text: ● Preprocessing input text is an important step for any NLP task. Why? ● Different pre-processing steps analyzed: Preprocessing Step Original Text Modified Text Number Replacement I have 100 apples. I have n Apples Special Words I’d run a marathon. I would run a marathon. Stop word removal I have 100 apples. I 100 apples
  • 24. Importance of Text Preprocessing: Number Replacement: ● Wordvec, glove vectors don’t recognize numbers ● Numbers have to be replaced by n Special Words: ● Words like I’d , I’m cause problems when tokenized ○ I’d - I + d ○ I’m - I + m ● The meaning is lost. ● We need to handle these word separately. ○ I’d - I would ○ I’m - I am Stop Words Removal: ● Stop words increase complexity of system, they might or might not provide any additional information
  • 25. Handling Imbalance Classes: ● Training Data and Testing Data might have different distribution of positive and negative examples. ● If training data has considerably more positive examples, the model tends towards positive while prediction and vice versa. ● In our data, ○ Train Set: 36.92% positive example ○ Test set: 17.46% positive examples ● We need map the share of positive examples to be the same in Test/Train. ● The weight of one positive example in the train set counts for 0.472 ( 0.1746 / 0.3692) positive entities in the test set. ● Similarly, the weight of negative example in the train set is (1 - 0.1746) / (1 - 0.3692) = 1.309 . ● Log loss function = -( 0.472001959 * t * logy + 1.309028344 * (1.0 - t) * log(1.0 - y) ) Where t: target value , y : predicted value ● Class Weighting can be provided as parameter in ‘Keras’. Note: ● This is a work around to achieve better Performance on Test set. As in this case, the distribution is totally skewed in Test Set. ● Ideally, We need to split the Train, Test and Validation set, maintaining class balance.
  • 26. Question Pair Symmetry: ● Tried to interchange Q1 and Q2 to see if effects the Model. ● Sometimes model might learn features which are related to the question order. ● We interchanged half of Q1 with Q2 and trained the model. Analysis: Different Preprocessing Steps and their effect on log-loss Model parameters: ● Length of Each Question- 30 ● Recurrent Dropout - 0.2 ● Dropout - 0.2 ● Length of LSTM Layer -100 ● Train : Validation - 9:1
  • 27. Model Comparison: Steps Train Loss Validatio n loss Public Test loss Private Test Loss Symmetry-No , Simple Text Preprocessing -Yes , Class Weighting - No 0.4034 0.4056 0.4041 0.4065 Symmetry-No , Simple Text Preprocessing -Yes , Class Weighting - Yes 0.3045 0.2931 0.3104 0.3144 Symmetry-Yes Simple Text Preprocessing -Yes Class Weighting - Yes 0.3033 0.2925 0.3079 0.3109 Symmetry-Yes , Simple Text Preprocessing -Yes , Class Weighting - Yes, Stop word Removal - Yes 0.3512 0.3567 0.3626 0.3617 Observations: ● Underfitting for some models - Restricted to 20 epochs due to time-constraint( The log-loss was still decreasing) ● Stop words Removal decreased the performance. Stop words had contextual information which was important. Note: Simple Text Preprocessing - Numbers and Special Words
  • 28.
  • 29. Further Steps which can Explore: ● Parameter tuning especially dropout can be tuned to attain better log-loss. ● Analysis using increased number of epochs. ● Ensemble of Models ○ Bagging - Avoids Overfitting ○ Boosting - Helps improve weak models( Under fitting) ● Observed lot of Nouns in the data, Parts-of-Speech features ○ Replacing Proper Noun by Noun ● More text Processing: Lemmatization, Stemming ● Character Level LSTM’s ● Word - Attention Model’s
  • 30. References: 1. Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis - http://www.aclweb.org/anthology/E17-1096 2. A Hierarchical Neural Autoencoder for Paragraphs and Documents - https://arxiv.org/pdf/1506.01057v2.pdf 3. Siamese Recurrent Architectures for Learning Sentence Similarity - http://www.mit.edu/~jonasm/info/MuellerThyagarajan_AAAI16.pdf 4. Efficient Estimation of Word Representations in Vector Space - https://arxiv.org/pdf/1301.3781.pdf 5. GloVe: Global Vectors for Word Representation - https://nlp.stanford.edu/pubs/glove.pdf 6. HDLTex: Hierarchical Deep Learning for Text Classification - https://arxiv.org/pdf/1709.08267.pdf 7. Signature Verification using a "Siamese" Time Delay Neural Network - https://papers.nips.cc/paper/769-signature-verification-using-a-siamese-time-delay-neural-network.pdf 8. Dropout: A Simple Way to Prevent Neural Networks from Overfitting - http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf 9. Understanding CNNs for NLP - http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ 10. Comparison study of CNNs and RNNs for NLP - https://arxiv.org/pdf/1702.01923.pdf 11. http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 12. Term Frequency Inverse Document Frequency - http://www.tfidf.com/