Productionizing Semi-Supervised Deep Learning Systems

Learnings from productionizing a semi-supervised
deep learning system at the petabyte scale.
Presented By : Samiran Roy
LinkedIn: samiranroy
Slides: http://bit.ly/tech_triveni_samiran

6
❑ 100 bn+ text datapoints
❑ 1 PB+ size
❑ 20 mn+ unique words
❑ 10+ classiﬁcation challenges
❑ 3 mn+ classes*
❑ >90% precision/recall*
Data analytics at scale

7
❑ Acts on real world data (Complete pipeline)
❑ Impacts business decisions
❑ Online Inference (prediction on demand)
❑ Shifting data distribution
Production System - Aspects

8
❑ Simple (but weird) Ideas
❑ Common Sense (in hindsight)
This presentation

9
Tentative Outline
❑ Before
❑ During (ML Model Design)
❑ After

10
Power Law
Number of
Occurrences
good morning good morning family and friends good morning for a very rainy france no gardening for me today

11
Power Law
Number of
Occurrences
Data Instance
Simple models
suﬃcient for the ‘fat
head’
Generalizability needed
for the ‘long tail’

12
The Before Stage
❑ Bunch of data points
❑ Maybe a few labels, a vague business use case
❑ Where do you start?

13
Sampling
❑ 100 bn -> 1 bn problem
❑ Random sampling works … to some extent
❑ Avoid hidden biases! (Data driven domain
expertise)

16
Data Swimming
The ﬁrst step to training a neural net is to not touch any neural net code at all and instead begin by
thoroughly inspecting your data. This step is critical. I like to spend copious amount of time (measured in
units of hours) scanning through thousands of examples, understanding their distribution and looking for
patterns. Luckily, your brain is pretty good at this. One time I discovered that the data contained duplicate
examples. Another time I found corrupted images / labels. I look for data imbalances and biases. I will
typically also pay attention to my own process for classifying the data, which hints at the kinds of
architectures we’ll eventually explore. As an example - are very local features enough or do we need
global context? How much variation is there and what form does it take? What variation is spurious and
could be preprocessed out? Does spatial position matter or do we want to average pool it out? How much
does detail matter and how far could we aﬀord to downsample the images? How noisy are the labels?
~ Andrej Karpathy, (Senior Director of AI at Tesla)
(http://karpathy.github.io/2019/04/25/recipe/)

17
Data Swimming Insights
❑ What kind of machine learning model to use
❑ Duplicates in data/Data reduction
❑ Unused variables
❑ Corrupted/noisy labels
❑ Imbalances
❑ Simple rules to classify the data
❑ Hints on cleaning the data
❑ Distribution of variables
❑ And much much more!

18
What is a ‘good’ accuracy?
❑ Stock direction prediction (60% ?)
❑ Question answering (80% ?)
❑ Movie recommendation engine (90 % ?)
❑ Fraud detection (99.9% ?)
❑ Computer Vision Tasks in Driverless cars?

19
Exploring the long tail
Ref: https://gradientdescent.co/t/tesla-autonomy-day-watch-the-full-event/216/18

21
1 bn -> 1 mn (for labelling)
1. Get a good representation
2. Visualize your data
3. Cluster your data
4. Sample from each cluster!
Machine learning in Samping

22
1) Get a good representation
Representation is key

23
Word embeddings!

24
Vector Arithmetic!

25
Word embeddings - Popular
Approaches
❑ Word to Vec
❑ Fasttext
❑ Glove
❑ Bert/Elmo

28
Unsupervised learning - Approaches
❑ hdbscan, faiss

29
Quantifying forgetfulness - Nearest
Neighbor Sampling

30
1. Get a good representation
2. Visualize your data
3. Cluster your data
4. Sample from each cluster! (Resample if necessary)
Machine learning in Samping

31
100 bn -> 1 bn -> 1 mn
Next Step?

32
Data Annotation!
❑ Hire Local
❑ Normalize, remove duplicates
❑ ML assisted interfaces

33
ML assisted Data Annotation!

34
Semi-supervised learning - The ﬁrst
approach
Unsupervised word
Embeddings (1bn)
+
Deep Learning Models
(1mn)
train: Sachin Tendulkar played an amazing knock for india
test: Virender Sehwag played an amazing knock for india

35
Semi-supervised learning - The ﬁrst
approach
Unsupervised word
Embeddings (1bn)
+
Deep Learning Models
(1mn)
train test

36
Semi-supervised learning - The
second approach
Combining supervised + self-supervised objective functions

37
Self Supervision
Ref:
https://deepai.org/publication/boosting-supervision-with-self-supervision-for
-few-shot-learning

38
Supervision + Self Supervision
Ref:
https://deepai.org/publication/boosting-supervision-with-self-supervision-for
-few-shot-learning

39
Further reading
Unsupervised pre-training with supervised ﬁne tuning
http://www.iro.umontreal.ca/~lisa/pointeurs/dbn_supervised_tr1282.pdf
Combining supervised + self-supervised objective functions
https://ronan.collobert.com/pub/matos/2012_deeplearning_springer.pdf ,
https://deepai.org/publication/boosting-supervision-with-self-supervisio
n-for-few-shot-learning, https://arxiv.org/pdf/1809.08370.pdf,
https://arxiv.org/pdf/1905.03670v2.pdf
Combining techniques like Entropy Minimization, Consistency
Regularization, Genetic Regularization
https://arxiv.org/pdf/1905.02249v2.pdf

40
Practical Considerations
❑ Caching/BI Layer
❑ Model from Scratch
❑ Team Composition
❑ Avoiding Hidden technical debt

41
Team Composition
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-sy
stems

42
Avoiding Hidden Technical Debt
❑ Entanglement (CACE) - what if we change the distribution
of feature x1 in [x1, x2, x3] ?
❑ Correction cascades -> what is model M2 uses the output
of Model M1?
❑ Hidden/Unintended feedback loops
❑ Full List:
https://papers.nips.cc/paper/5656-hidden-technical-de
bt-in-machine-learning-systems

43
Tooling
❑ Root cause analysis
❑ Model Interpretation
❑ Automatic feedback loops

44
Root Cause Analysis
❑ Plug and Play
❑ Output metrics
❑ Output confusion matrices

45
Model Interpretation
https://indico.scc.kit.edu/event/344/contributions/2434/attac
hments/1258/1759/Talk_Samek.pdf (Layerwise Relevance
Propagation)

46
Model Interpretation
Category: Sports
Alongside Tendulkar and Sehwag, several other retired international
cricketers will be seen in action
Alongside Tendulkar and Sehwag, several other retired international
cricketers will be seen in action

47
Automatic Feedback Loops
Productionized
text DL Model
Model parameters
Input dataset
Model
M
Input I
Will M classify I
correctly?

48
Automatic Feedback Loops
(Ensemble of models)
❑ Monitor the changing distribution of data
❑ New words/sentences/classes not present in training
❑ Generated embedding is far from any cluster/Conﬁdence
score is low
❑ Random Sampling

49
Thank You
LinkedIn: samiranroy
Slides: http://bit.ly/tech_triveni_samiran
Get in touch for (Data Science/Software Dev) Internships!

Productionizing Semi-Supervised Deep Learning Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Productionizing Semi-Supervised Deep Learning Systems

Semelhante a Productionizing Semi-Supervised Deep Learning Systems (20)

Mais de Tech Triveni

Mais de Tech Triveni (20)

Último

Último (20)

Productionizing Semi-Supervised Deep Learning Systems