SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Quora Question Pairs
Competition
by Andriy Gryshchuk
popularity
3,300 teams (>4,000 participants)
NLP
Features engineering
Deep Learning
Interesting and big enough dataset
Different from other recent competitions
Goal - Find duplicate questions
Classification formulation:
For each pair of questions predict probability that the
questions have the same meaning
Data
Train set: 400,000 pairs of questions
(very big comparing with the previously available sets for paraphrase detection)
(question1, question2, is_duplicate)
Test set: 2,345,796 pairs
(some of them are artificially generated as anti-cheating)
Manually labeled (noisy)
Examples - positive
'Why have human beings evolved more than any other beings on Earth?'
'What technicality results in humans being more intelligent than other animals?'
'How Do You Protect Yourself from Dogs?'
'What is the best way to save yourself from an attacking angry dog?'
'Why are Quorians more liberal than conservative?'
'Why does Quora tend to attract more leftists than conservatives?'
Examples - Negatives
Examples - negatives
How to convert fractions to whole numbers?
How do you convert whole numbers into fractions?
What tips do you have for starting a school newspaper?
What are some tips on starting a school newspaper?
What Do I Do About My Boyfriend Ignoring Me?
What should I do when my boyfriend is ignoring me?
How dangerous is Mexico City?
Why is Mexico City dangerous?
What are some words that exist in English but do not exist in Japanese?
What are some words that exist in Japanese but do not exist in English?
Negatives are not random
There are positive pairs with no common words
There are negative pairs with all the words common
A lot of ambiguous cases
Noise
Metric
Logloss - questionable, ROC - could be much better choice
Very different distributions of the train and test sets
36% positives in trainset
17% positive in test set (public part)
Upsampling (or formula)
Metric
Logloss - questionable, ROC - could be much better choice
Very different distributions of the train and test sets
36% positives in trainset
17% positive in test set (public part)
Upsampling (or formula)
When distributions are different choose metric less
sensible to distribution changes
Approaches
Classical ML vs Deep Learning
Approaches
Classical ML
90% efforts creating features 10% efforts modelling
Deep Learning
5% efforts creating features 95% efforts modelling
Approaches
Classical ML
90% efforts creating features 10% efforts modelling
Deep Learning
5% efforts creating features 95% efforts modelling
Kaggle way - Ensemble them all
Classical ML
90% efforts creating features
10% efforts modelling
My team has about 300 features
One of the top team claimed 4000 features
Sentence as Vector
Sentence vector - just mean of the word vectors
Or weighted mean - how to find right weights?
unsupervised methods
Similarities:
Cosine similarity
Cityblock distance
Euclidean distance
Raw Embeddings
Raw embeddings are surprisingly powerful features
Sentence to vector and just use vectors components as
features
Which Wordvectors
Glove, Word2vec?
50D, 100D, 200D, 300D?
All of them?
Ensembles improves when models run of different embeddings.
Deep Learning
Modeling 95% of efforts
Features 5% of efforts
Pretrained embeddings of words are features
Pad and cut sentences to the same length
Start modelling
Ideas for NNs
Sentence embeddings computed just as the mean of the
word vectors are powerful
Ideas for NNs
Sentence embeddings computed just as the mean of the
word vectors are powerful
Weighted mean?
Non-linearity?
Ideas for NNs
Sentence embeddings computed just as the mean of the
word vectors are powerful
Weighted mean?
Non-linearity?
This is NN
Ideas for NNs
Sentence embeddings computed just as the mean of the
word vectors are powerful
Weighted mean?
Non-linearity
This is NN
Still just bag of words
Ideas for NNs
Sentence embeddings computed just as the mean of the
word vectors are powerful
Weighted mean?
Non-linearity
This is NN
Still just bag of words
N-grams?
Ideas for NNs
Sentence embeddings computed just as the mean of the
word vectors are powerful
Weighted mean?
Non-linearity
This is NN
Still just bag of words
N-grams?
This is convolutional NN
Symmetry asks for
Question 1 Question 2
Neural Network
All weights shared
Output
Question 1 Question 2
Common embedding layer
Fully Connected Layer
Conv
Block 1
Conv
Block N...
Paraphrase detection state of the art
Microsoft Research Paraphrase Corpus (~5,000 sentence pairs)
Results Table
Methods:
Unsupervised - phrase vector as weighted average
Autoencoder - better phrase vector
Supervised - CNN + structured features
Previous works
Socher, R. and Huang, E.H., and Pennington, J. and Ng, A.Y., and Manning, C.D. (2011). Dynamic pooling
and unfolding recursive autoencoders for paraphrase detection
Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Matthew Purver
Evaluating Neural Word Representations in Tensor-Based Compositional Settings
He, Hua, Gimpel K. and Lin J. (2015). Multi-Perspective Sentence Similarity Modeling with Convolutional
Neural Networks
From He, Hua, Gimpel K. and Lin J. (2015). Multi-Perspective Sentence Similarity Modeling with
Convolutional Neural Networks
Symmetry asks for
Question 1 Question 2
Neural Network
All weights shared
Output
Question 1 Question 2
Common embedding layer
Fully Connected Layer
Conv
Block 1
Conv
Block N...
Convolutional Block as main component
Input 1 Input 2
Number of convolutional transformations
Global Pooling
one
number
Parameters of convolutional block
● Filter size
● Number of filters
● Global Pooling
● Depth
● Kernel regularizers, activity regularizer
● combine transformation (cosine, euclidean, cityblock)
Shallow Convolutional Block
def conv_lst4(layer_class, size, out_dim = 300, activation='relu',
kernel_regularizer = None, activity_regularizer = None):
res = []
res.append(layer_class(out_dim, size, activation=activation,
kernel_regularizer = kernel_regularizer,
activity_regularizer = activity_regularizer))
res_max = res.copy()
res_max.append(GlobalMaxPooling1D())
res_avg = res.copy()
res_avg.append(GlobalAveragePooling1D())
for res in [res_max, res_avg]:
res.append(Dense(out_dim, activation='linear'))
return [res_max, res_avg]
…
deep_lst = [conv_deep_lst(Conv1D, size, emb_mx.shape[1],
kernel_regularizer = kernel_regularizer,
activity_regularizer = activity_regularizer) for size in [3,4]]
a_deep = [apply_layers(f,a) for f in deep_lst]
b_deep = [apply_layers(f,b) for f in deep_lst]
dot_deep = [keras.layers.dot([a,b], normalize=True,axes=-1) for a,b in zip(a_deep,b_deep)]
….
Embeddings
Use pretrained?
Train your own?
Embeddings
Use pretrained?
Train your own?
Depends how much data you have
Trainable embeddings
Super powerful
Super easy to overfit
Regularize
L2 penalty for embedding weights
Average several runs
Two copies of embeddings
The same initial state (pretrained)
Trainable and frozen
{ 'name': 'nn_m8',
'fit_fun':fit_nn,
'fit_par': {
'n_iter':6,
'build_fun': partial(build_m8, train_emb = True, max_pool = True,
embeddings_regularizer = keras.regularizers.l2(1e-5),
n_more = X_train_stored.shape[1]),
'schedule': [(1e-3,5), (1e-5,2)],
'jit_sch':partial(jit_schedule, vol = 0.1)}
}
def jit_schedule(schedule, vol = 0.1):
for lr,ep in schedule:
lr = np.random.uniform(lr - vol*lr, lr + vol*lr)
yield lr,ep
RNNs vs CNNs
RNNs vs CNNs
Similar accuracy
CNNs two orders of magnitude faster
Fast CNN allows to average many runs
More Feature to NN
Features created for Classifiers were added to NN
End-to-end promise is great but if you already have
features use them
Final model
Question 1 Question 2
Neural Network
All weights shared
Output
Fully Connected Layer “Classical” Features
Other NNs
RNNs - several order of magnitudes slower
Character level RNNs - very slow
RNNs with attention
NNs on the same features as tree-based classifiers
Top team reports that NNs on word vectors + classical
features work the best
Xgboost and alikes exploited the leak well
Analysis
Shallow convolutions
Just bag of words or bag of n-grams
No internal representation of “meaning” or “topic”
How to improve?
Deeper networks - would require dedicated embeddings
Positional embeddings
Transfer learning - apply a pre-trained Neural Translation
model and take the hidden state of the decoder as input
Ensemble
5-Folds on the first level
First level itself was average of several runs
Xgboost on the second level
CV unstable
“upsample-bagging” on the second level
Real bagging on the second level (800 rounds)
“third-level” - team ensemble (just weighted average)
Ensemble
('/meta_84_glove_6b_50d/nn_m8/', 0.17182514423484868),
('/meta_84_glove_6b_300d/nn_m8/', 0.17308944420181949),
('/meta_84_glove_6b_100d/nn_m8/', 0.17327907625486416),
('/meta_84_glove_6b_100d/gbm_tuned_00025/', 0.17386390869911419),
('/meta_84_glove_6b_200d/nn_m8/', 0.17478704276847895),
('/meta_84_glove_6b_100d/gbm_tuned_001/', 0.17486090598394549),
('/meta_83_glove_6b_50d/gbm_tuned_00025/', 0.17514204487042342),
('/meta_84_glove_6b_50d/gbm_tuned_001/', 0.17626284406063045),
('/meta_84_glove_6b_100d/gbm_dart_01/', 0.17639511061431704),
('/meta_84_glove_6b_100d/xgb_02_d10/', 0.17660031146404326),
('/meta_83_glove_6b_50d/gbm_tuned_0025/', 0.17688759229979395),
('/meta_84_glove_6b_50d/gbm_dart_005/', 0.17713067893022988),
('/meta_84_glove_6b_50d/gbm_dart_01/', 0.17761469925842949),
('/meta_84_glove_6b_100d/xgb_05_d10/', 0.17832099461464535),
('/meta_82_glove_6b_50d/gbm_tuned_00025/', 0.17841488421938717),
('/meta_83_glove_6b_100d/nn_m61/', 0.18009823071205816),
('/meta_82_glove_6b_50d/gbm_tuned_0025/', 0.18026383839031426),
('/meta_84_glove_6b_50d/xgb_05_d10/', 0.18079926772563515),
('/meta_83_glove_6b_50d/xgb_05/', 0.18513503621897476),
('/meta_83_glove_6b_100d/nn_m51_cn3/', 0.18574331177990389),
('/meta_83_glove_6b_200d/nn_m62/', 0.18607323372840762),
('/meta_83_glove_6b_50d/nn_m6/', 0.18646785119161874),
('/meta_82_glove_6b_50d/xgb_05/', 0.1875326701626234),
Final Ensemble
20 rounds of “upsample-bagging” of Xgboost of 44 1st level
models
The team ensemble: 0.8*andriy’s model + 0.2*komaki’s
Unfortunate Event
Leak
50% of kaggle competitions have leaks, 20% have “killer” leaks
What about real life?
Be ready
Top team exploited the leak a lot
Difficult to compare genuine results
The leak could poison genuine features as well
Trainable embeddings might get info from the leak
Sampling process common reason of Kaggle’s leaks I would
suppose in real life it is true as well. Be careful.
Hyperparameters tuning
Ensembles give more than extensive tuning
Just simple average of two reasonable but different models
is better that one overtuned model
K-fold ensembles of different models beat everything
K-fold ensemble even for single model with one set of
hyperparameters
Overtuned models are fragile
Love tuning - regularize
Questions

Mais conteúdo relacionado

Mais procurados

Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringNAVER D2
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems23ashmawy
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...Kensuke Mitsuzawa
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Ohsawa Goodfellow
 
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...Jinho Choi
 
Fuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQFuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQShaheen Shaikh
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路台灣資料科學年會
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델guesta34d441
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 

Mais procurados (20)

Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-Answering
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems2
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
 
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Fuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQFuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQ
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
Machine learning
Machine learningMachine learning
Machine learning
 
Tensorflow Training From Bangalore,Online and Classrooms
Tensorflow Training From Bangalore,Online and ClassroomsTensorflow Training From Bangalore,Online and Classrooms
Tensorflow Training From Bangalore,Online and Classrooms
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 

Semelhante a Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk

deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Part 1
Part 1Part 1
Part 1butest
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...NETFest
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
 
Topic_6
Topic_6Topic_6
Topic_6butest
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018Nancy Garmer
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 

Semelhante a Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk (20)

deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
Part 1
Part 1Part 1
Part 1
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
Topic_6
Topic_6Topic_6
Topic_6
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 

Mais de Grammarly

Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Grammarly
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonGrammarly
 

Mais de Grammarly (14)

Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 

Último

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk

  • 2. popularity 3,300 teams (>4,000 participants) NLP Features engineering Deep Learning Interesting and big enough dataset Different from other recent competitions
  • 3.
  • 4. Goal - Find duplicate questions Classification formulation: For each pair of questions predict probability that the questions have the same meaning
  • 5. Data Train set: 400,000 pairs of questions (very big comparing with the previously available sets for paraphrase detection) (question1, question2, is_duplicate) Test set: 2,345,796 pairs (some of them are artificially generated as anti-cheating) Manually labeled (noisy)
  • 6. Examples - positive 'Why have human beings evolved more than any other beings on Earth?' 'What technicality results in humans being more intelligent than other animals?' 'How Do You Protect Yourself from Dogs?' 'What is the best way to save yourself from an attacking angry dog?' 'Why are Quorians more liberal than conservative?' 'Why does Quora tend to attract more leftists than conservatives?'
  • 7. Examples - Negatives Examples - negatives How to convert fractions to whole numbers? How do you convert whole numbers into fractions? What tips do you have for starting a school newspaper? What are some tips on starting a school newspaper? What Do I Do About My Boyfriend Ignoring Me? What should I do when my boyfriend is ignoring me? How dangerous is Mexico City? Why is Mexico City dangerous? What are some words that exist in English but do not exist in Japanese? What are some words that exist in Japanese but do not exist in English?
  • 8. Negatives are not random There are positive pairs with no common words There are negative pairs with all the words common A lot of ambiguous cases Noise
  • 9. Metric Logloss - questionable, ROC - could be much better choice Very different distributions of the train and test sets 36% positives in trainset 17% positive in test set (public part) Upsampling (or formula)
  • 10. Metric Logloss - questionable, ROC - could be much better choice Very different distributions of the train and test sets 36% positives in trainset 17% positive in test set (public part) Upsampling (or formula) When distributions are different choose metric less sensible to distribution changes
  • 11. Approaches Classical ML vs Deep Learning
  • 12. Approaches Classical ML 90% efforts creating features 10% efforts modelling Deep Learning 5% efforts creating features 95% efforts modelling
  • 13. Approaches Classical ML 90% efforts creating features 10% efforts modelling Deep Learning 5% efforts creating features 95% efforts modelling Kaggle way - Ensemble them all
  • 14. Classical ML 90% efforts creating features 10% efforts modelling My team has about 300 features One of the top team claimed 4000 features
  • 15. Sentence as Vector Sentence vector - just mean of the word vectors Or weighted mean - how to find right weights? unsupervised methods Similarities: Cosine similarity Cityblock distance Euclidean distance
  • 16. Raw Embeddings Raw embeddings are surprisingly powerful features Sentence to vector and just use vectors components as features
  • 17. Which Wordvectors Glove, Word2vec? 50D, 100D, 200D, 300D? All of them? Ensembles improves when models run of different embeddings.
  • 18. Deep Learning Modeling 95% of efforts Features 5% of efforts Pretrained embeddings of words are features Pad and cut sentences to the same length Start modelling
  • 19. Ideas for NNs Sentence embeddings computed just as the mean of the word vectors are powerful
  • 20. Ideas for NNs Sentence embeddings computed just as the mean of the word vectors are powerful Weighted mean? Non-linearity?
  • 21. Ideas for NNs Sentence embeddings computed just as the mean of the word vectors are powerful Weighted mean? Non-linearity? This is NN
  • 22. Ideas for NNs Sentence embeddings computed just as the mean of the word vectors are powerful Weighted mean? Non-linearity This is NN Still just bag of words
  • 23. Ideas for NNs Sentence embeddings computed just as the mean of the word vectors are powerful Weighted mean? Non-linearity This is NN Still just bag of words N-grams?
  • 24. Ideas for NNs Sentence embeddings computed just as the mean of the word vectors are powerful Weighted mean? Non-linearity This is NN Still just bag of words N-grams? This is convolutional NN
  • 25. Symmetry asks for Question 1 Question 2 Neural Network All weights shared Output
  • 26. Question 1 Question 2 Common embedding layer Fully Connected Layer Conv Block 1 Conv Block N...
  • 27. Paraphrase detection state of the art Microsoft Research Paraphrase Corpus (~5,000 sentence pairs) Results Table Methods: Unsupervised - phrase vector as weighted average Autoencoder - better phrase vector Supervised - CNN + structured features
  • 28.
  • 29. Previous works Socher, R. and Huang, E.H., and Pennington, J. and Ng, A.Y., and Manning, C.D. (2011). Dynamic pooling and unfolding recursive autoencoders for paraphrase detection Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Matthew Purver Evaluating Neural Word Representations in Tensor-Based Compositional Settings He, Hua, Gimpel K. and Lin J. (2015). Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
  • 30. From He, Hua, Gimpel K. and Lin J. (2015). Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
  • 31. Symmetry asks for Question 1 Question 2 Neural Network All weights shared Output
  • 32. Question 1 Question 2 Common embedding layer Fully Connected Layer Conv Block 1 Conv Block N...
  • 33. Convolutional Block as main component Input 1 Input 2 Number of convolutional transformations Global Pooling one number
  • 34. Parameters of convolutional block ● Filter size ● Number of filters ● Global Pooling ● Depth ● Kernel regularizers, activity regularizer ● combine transformation (cosine, euclidean, cityblock)
  • 35. Shallow Convolutional Block def conv_lst4(layer_class, size, out_dim = 300, activation='relu', kernel_regularizer = None, activity_regularizer = None): res = [] res.append(layer_class(out_dim, size, activation=activation, kernel_regularizer = kernel_regularizer, activity_regularizer = activity_regularizer)) res_max = res.copy() res_max.append(GlobalMaxPooling1D()) res_avg = res.copy() res_avg.append(GlobalAveragePooling1D()) for res in [res_max, res_avg]: res.append(Dense(out_dim, activation='linear')) return [res_max, res_avg]
  • 36. … deep_lst = [conv_deep_lst(Conv1D, size, emb_mx.shape[1], kernel_regularizer = kernel_regularizer, activity_regularizer = activity_regularizer) for size in [3,4]] a_deep = [apply_layers(f,a) for f in deep_lst] b_deep = [apply_layers(f,b) for f in deep_lst] dot_deep = [keras.layers.dot([a,b], normalize=True,axes=-1) for a,b in zip(a_deep,b_deep)] ….
  • 38. Embeddings Use pretrained? Train your own? Depends how much data you have
  • 39. Trainable embeddings Super powerful Super easy to overfit Regularize L2 penalty for embedding weights Average several runs
  • 40. Two copies of embeddings The same initial state (pretrained) Trainable and frozen
  • 41. { 'name': 'nn_m8', 'fit_fun':fit_nn, 'fit_par': { 'n_iter':6, 'build_fun': partial(build_m8, train_emb = True, max_pool = True, embeddings_regularizer = keras.regularizers.l2(1e-5), n_more = X_train_stored.shape[1]), 'schedule': [(1e-3,5), (1e-5,2)], 'jit_sch':partial(jit_schedule, vol = 0.1)} } def jit_schedule(schedule, vol = 0.1): for lr,ep in schedule: lr = np.random.uniform(lr - vol*lr, lr + vol*lr) yield lr,ep
  • 43. RNNs vs CNNs Similar accuracy CNNs two orders of magnitude faster Fast CNN allows to average many runs
  • 44. More Feature to NN Features created for Classifiers were added to NN End-to-end promise is great but if you already have features use them
  • 45. Final model Question 1 Question 2 Neural Network All weights shared Output Fully Connected Layer “Classical” Features
  • 46. Other NNs RNNs - several order of magnitudes slower Character level RNNs - very slow RNNs with attention NNs on the same features as tree-based classifiers Top team reports that NNs on word vectors + classical features work the best Xgboost and alikes exploited the leak well
  • 47. Analysis Shallow convolutions Just bag of words or bag of n-grams No internal representation of “meaning” or “topic”
  • 48. How to improve? Deeper networks - would require dedicated embeddings Positional embeddings Transfer learning - apply a pre-trained Neural Translation model and take the hidden state of the decoder as input
  • 49. Ensemble 5-Folds on the first level First level itself was average of several runs Xgboost on the second level CV unstable “upsample-bagging” on the second level Real bagging on the second level (800 rounds) “third-level” - team ensemble (just weighted average)
  • 50. Ensemble ('/meta_84_glove_6b_50d/nn_m8/', 0.17182514423484868), ('/meta_84_glove_6b_300d/nn_m8/', 0.17308944420181949), ('/meta_84_glove_6b_100d/nn_m8/', 0.17327907625486416), ('/meta_84_glove_6b_100d/gbm_tuned_00025/', 0.17386390869911419), ('/meta_84_glove_6b_200d/nn_m8/', 0.17478704276847895), ('/meta_84_glove_6b_100d/gbm_tuned_001/', 0.17486090598394549), ('/meta_83_glove_6b_50d/gbm_tuned_00025/', 0.17514204487042342), ('/meta_84_glove_6b_50d/gbm_tuned_001/', 0.17626284406063045), ('/meta_84_glove_6b_100d/gbm_dart_01/', 0.17639511061431704), ('/meta_84_glove_6b_100d/xgb_02_d10/', 0.17660031146404326), ('/meta_83_glove_6b_50d/gbm_tuned_0025/', 0.17688759229979395), ('/meta_84_glove_6b_50d/gbm_dart_005/', 0.17713067893022988), ('/meta_84_glove_6b_50d/gbm_dart_01/', 0.17761469925842949), ('/meta_84_glove_6b_100d/xgb_05_d10/', 0.17832099461464535), ('/meta_82_glove_6b_50d/gbm_tuned_00025/', 0.17841488421938717), ('/meta_83_glove_6b_100d/nn_m61/', 0.18009823071205816), ('/meta_82_glove_6b_50d/gbm_tuned_0025/', 0.18026383839031426), ('/meta_84_glove_6b_50d/xgb_05_d10/', 0.18079926772563515), ('/meta_83_glove_6b_50d/xgb_05/', 0.18513503621897476), ('/meta_83_glove_6b_100d/nn_m51_cn3/', 0.18574331177990389), ('/meta_83_glove_6b_200d/nn_m62/', 0.18607323372840762), ('/meta_83_glove_6b_50d/nn_m6/', 0.18646785119161874), ('/meta_82_glove_6b_50d/xgb_05/', 0.1875326701626234),
  • 51. Final Ensemble 20 rounds of “upsample-bagging” of Xgboost of 44 1st level models The team ensemble: 0.8*andriy’s model + 0.2*komaki’s
  • 52. Unfortunate Event Leak 50% of kaggle competitions have leaks, 20% have “killer” leaks What about real life? Be ready
  • 53. Top team exploited the leak a lot Difficult to compare genuine results The leak could poison genuine features as well Trainable embeddings might get info from the leak Sampling process common reason of Kaggle’s leaks I would suppose in real life it is true as well. Be careful.
  • 54. Hyperparameters tuning Ensembles give more than extensive tuning Just simple average of two reasonable but different models is better that one overtuned model K-fold ensembles of different models beat everything K-fold ensemble even for single model with one set of hyperparameters Overtuned models are fragile Love tuning - regularize