SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Robustness Metrics for ML
Models based on Deep
Learning Methods
Davide Posillipo

Alkemy - Data Science Milan Meetup - AI Guild

21/07/2022, Milan
Who am I?
Hi! My name is Davide Posillipo.

I studied Statistics. 

For 8+ years I’ve been working on
projects that require using data to
answer questions. 

Freelancer Data Scientist and Teacher.

Currently building the Alkemy’s Deep
Learning & Big Data Department. 

Happy to connect and work together on
new ideas and projects. Get in touch!
This event,


tonight
Let’s get started
MNIST: a simple case
‱ MNIST: solved problem 

‱ LeNet: Deep Convolutional Neural Network, 99% accuracy
Predicted 

digit
What happens if
we pass a
chicken to
LeNet?
The chicken comes from a
di
ff
erent distribution.
What should the model do?
What happens if we pass a chicken to LeNet?
We expect this output:
?
What happens if we pass a chicken to LeNet?

 but we obtain this result:
6, with probability 0.9
A less obvious example: Fashion MNIST
63.4% of examples are classi
fi
ed with a probability
greater than 0.99

74.3% of examples are classi
fi
ed with a probability
greater than 0.95 

88.9% of examples are classi
fi
ed with a probability
greater than 0.75
LeNet (trained for MNIST) produces “con
fi
dent” predictions for the Fashion MNIST
Can we trust deep learning predictions?
‱ There are scenarios where it’s crucial to avoid meaningless
predictions (e.g. healthcare, high precision manufacturing,
autonomous driving
): safety in AI

‱ No prediction better than random guessing

‱ Need of some robustness metrics
Robustness metrics: how should it be?
A robustness metrics should tell us if the prediction is reliable or if it
is some sort of random guessing. It should provide some measure of
con
fi
dence in our prediction.

‱ Computable without labels 

‱ Computable in “real-time” 

‱ Easy to plug-in into a working ML pipeline

‱ Low false positive rate, but also low false negative rate

‱ High e
ff
ectiveness in detecting the anomaly points (low false negative
rate)

‱ Low rate of discarded “good predictions” (low false positive rate)
A few words about Robustness
Robustness: the quality of an entity that doesn’t break too easily if
stressed in some way.

Canonical approaches in Statistics involve some modi
fi
cation of
the predictive model/estimator (“If you get into a stressful
situation, handle it better”), with often a loss of performance as
side e
ff
ect.

Tonight we focus on a di
ff
erent approach: we don’t modify our
models but we protect them from threats (“Don’t get into stressful
situations”). We look for the chickens and keep them away from
our model.

Fight-or-
fl
ight(-or-freeze) response: we choose to
fl
y!
Lack of robustness is a risk
There is an increasing attention in risks
associated with AI applications. 

EU proposed an AI regulation that could
become applicable to industry players starting
from 2024 (https://digital-
strategy.ec.europa.eu/en/policies/regulatory-
framework-ai)

The Commission is proposing the
fi
rst-ever
legal framework on AI, which addresses the
risks of AI and positions Europe to play a
leading role globally.

At some point, robustness will be a mandatory
requirements of ML models in production, and
lack of it will be considered a risk.
Robustness metrics: two approaches
‱ GAN-based approach

‱ Decide if a prediction is worth the risk, checking the
“stability” of the classi
fi
er for the new input data

‱ VAE-based approach

‱ Decide if a prediction is worth the risk, checking the true
“origin” of the new input data
First approach:
WGAN + Inverter
GAN in a nutshell
‱ Generative Adversarial
Networks

‱ Composed of two networks:

‱ Generator: generates
inputs from random noise

‱ Discriminator: decides if
the inputs from the
Generator are authentic
or arti
fi
cial

‱ After many iterations, the
model learns to create
inputs really close to the
authentic ones
A fancier GAN: WGAN
‱ Wasserstein Generative Adversarial Networks

‱ It use Wasserstein distance as GAN loss function

‱ Wasserstein Distance is a measure of the distance between two
probability distributions

‱ Fore more info: https://lilianweng.github.io/posts/2017-08-20-
gan/
‱ Train a Generator (from random noise to the input space)

‱ Train a Discriminator (it tries to understand if a point is true or
generated by the Generator)

‱ The Generator learns how to fool the Discriminator 

‱ Train an Inverter (from the input space to the latent
representation)

‱ The Inverter learns how the Generator maps from the
random noise to the input space
Adding the Inverter
Approach 1: overview


and loss functions
WGAN Loss Function
Inverter Loss Function
WGAN + Inverter: robustness metrics
‱ Given a new data point x,
fi
nd the closest point to x in the latent space
that is, once translated back to the original space, able to confound the
classi
fi
er f

‱ Robustness metrics: the distance ( delta_z = z* - I(x) ) between these two
points, in the latent representation
‱ If delta_z is “small”, the classi
fi
er is not con
fi
dent about its prediction
(the classi
fi
er “changes its mind” too easily)

‱ If delta_z is “big”, the classi
fi
er is con
fi
dent about the prediction (the
classi
fi
er “knows what it wants”)
Rule: if delta_z < threshold then do not predict
WGAN + Inverter: Architecture
‱ PyTorch implementation 

‱ Generator: 1 Fully Connected + 4 Transposed Conv. Layers
(strides = 2), ReLu activation func. 

‱ Discriminator (“Critic”): 4 Conv. Layers (strides = 2) + 1 Fully
Connected, ReLu activation func. 

‱ Inverter: 4 Conv. Layers (strides = 2) + 2 Fully Connected, ReLu
activation func.

‱ Adam optimizers for the three nets
WGAN + Inverter: Training and
distribution of delta_z
Distribution of delta_z computed on 500 test set
input points
WGAN + Inverter: Results
Distribution of delta_z computed on 500 test set input
points
Delta_z for the chicken: 1.8e-06 (< 5.3e-02)
Using the 5° percentile test delta_z as threshold, we could get a 5% as false
positive rate and discard the meaningless predictions for the chicken picture.
5° percentile: 5.3e-02
WGAN + Inverter: Results with Fashion
MNIST
Using the 5° percentile test delta_z as threshold, we could get a 5% as false
positive rate but a 34.4% of lost good predictions (false negative rate)!
Distribution of delta_z computed on 500 test set input
points
5° percentile: 5.3e-02
We need to reduce the false negative rate!
Second approach:


Variational AutoEncoder
AutoEncoder in a nutshell
‱ VAE: Variational AutoEncoder

‱ Composed of two neural networks:

‱ Encoder: takes the inputs and “compress” them in a deep latent representation

‱ Decoder: takes the deep representation and creates back the original input as
much as it can 

‱ After many iterations, the model learns the best “latent representation” (kind of
compression) of the training data
From AE to Variational AutoEncoder
Instead of mapping the input into a 
fi
xed vector, we map it into a distribution. 

In this way we learn the likelihood p(x|z), in this context called probabilistic decoder,
that we can use to generate from the latent space.

The encoding happens likewise: we learn the posterior p(z|x) in this context called
probabilistic encoder.

For more info: https://lilianweng.github.io/posts/2018-08-12-vae/
Variational Autoencoder: robustness
metrics
The loss function of a variational autoencoder is an indirect measure of the probability
that an observation comes from the same distribution that generated the training set. 

An “unlikely” input will have troubles in the encoding-decoding process = high loss
value.

Robustness metrics: loss function

‱ “Big” loss -> Not robust prediction: the new input data doesn’t come from the
training set underlying distribution

‱ “Small” loss -> Robust prediction: the new input data comes from the training
set underlying distribution

Notice that with this approach we don’t need the classi
fi
er f for the robustness metrics
computation!
Rule: if VAE_loss > threshold then do not predict
Variational Autoencoder: Architecture
‱ PyTorch implementation 

‱ Encoder: 2 layers fully connected neural network 

‱ Decoder: 2 layers fully connected neural network

‱ Adam optimizer
Variational Autoencoder: Training and
VAE loss distribution
Distribution of VAE losses computed on 10.000 test
set input points
Variational Autoencoder: Results
Distribution of VAE losses computed on 10.000 test
set input points
Using the maximum test loss as threshold, we could get a 0% as false positive
rate and discard the meaningless predictions for the chicken picture.
VAE loss for the chicken: 309.4 (>212)
Variational Autoencoder: Results with
Fashion MNIST
Distribution of VAE losses computed on
10.000 test set input points
Using the maximum test loss as threshold, we could get a 0% as false positive
rate and a 3.55% of lost good predictions (false negative rate)

Using the 95° percentile test loss as threshold, we could get a 5% as false
positive rate and a 0.25% of lost good predictions (false negative rate)
Variational Autoencoder: robustness
metrics into production
‱ Train your classi
fi
er

‱ Train a VAE on your training set 

‱ Get the distribution of the VAE losses on your test set 

‱ De
fi
ne a threshold more or less “conservative” 

‱ Implement a “conditional classi
fi
er”:
Conclusions
Wrapping up: pros and cons of this
approach
Pros
‱ No need to modify your predictive models

‱ The same “monitoring” system can be used for di
ff
erent ML
models (for a given dataset)

‱ Applicable to any kind of data (tabular, images, 
)

‱ “Easy” to explain 

‱ Easy to plug-in into existing pipelines

Cons
‱ Arbitrary thresholds must be set by the data scientist

‱ Introducing a further model that needs to be maintained
Thank you for your
attention!
https://www.linkedin.com/in/davide-posillipo/

Mais conteĂșdo relacionado

Semelhante a Robustness Metrics for ML Models based on Deep Learning Methods

08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machineShital Andhale
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolAmit Sharma
 
æ·±ćșŠć­žçż’ćœšAOI的應甚
æ·±ćșŠć­žçż’ćœšAOI的應甚深ćșŠć­žçż’ćœšAOI的應甚
æ·±ćșŠć­žçż’ćœšAOI的應甚CHENHuiMei
 
Improving your Agile Process
Improving your Agile ProcessImproving your Agile Process
Improving your Agile ProcessDavid Copeland
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudAnima Anandkumar
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...Lviv Startup Club
 

Semelhante a Robustness Metrics for ML Models based on Deep Learning Methods (20)

08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machine
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
 
æ·±ćșŠć­žçż’ćœšAOI的應甚
æ·±ćșŠć­žçż’ćœšAOI的應甚深ćșŠć­žçż’ćœšAOI的應甚
æ·±ćșŠć­žçż’ćœšAOI的應甚
 
Improving your Agile Process
Improving your Agile ProcessImproving your Agile Process
Improving your Agile Process
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloud
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 

Mais de Data Science Milan

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital paymentsData Science Milan
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansData Science Milan
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIData Science Milan
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraData Science Milan
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraData Science Milan
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AIData Science Milan
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...Data Science Milan
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaData Science Milan
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharData Science Milan
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoData Science Milan
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep LearningData Science Milan
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Data Science Milan
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyData Science Milan
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...Data Science Milan
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by CervedData Science Milan
 

Mais de Data Science Milan (20)

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plans
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AI
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina Khvatova
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex Honchar
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Robustness Metrics for ML Models based on Deep Learning Methods

  • 1. Robustness Metrics for ML Models based on Deep Learning Methods Davide Posillipo Alkemy - Data Science Milan Meetup - AI Guild 21/07/2022, Milan
  • 2. Who am I? Hi! My name is Davide Posillipo. I studied Statistics. For 8+ years I’ve been working on projects that require using data to answer questions. Freelancer Data Scientist and Teacher. Currently building the Alkemy’s Deep Learning & Big Data Department. Happy to connect and work together on new ideas and projects. Get in touch!
  • 5. MNIST: a simple case ‱ MNIST: solved problem ‱ LeNet: Deep Convolutional Neural Network, 99% accuracy Predicted digit
  • 6. What happens if we pass a chicken to LeNet? The chicken comes from a di ff erent distribution. What should the model do?
  • 7. What happens if we pass a chicken to LeNet? We expect this output: ?
  • 8. What happens if we pass a chicken to LeNet? 
 but we obtain this result: 6, with probability 0.9
  • 9. A less obvious example: Fashion MNIST 63.4% of examples are classi fi ed with a probability greater than 0.99 74.3% of examples are classi fi ed with a probability greater than 0.95 88.9% of examples are classi fi ed with a probability greater than 0.75 LeNet (trained for MNIST) produces “con fi dent” predictions for the Fashion MNIST
  • 10. Can we trust deep learning predictions? ‱ There are scenarios where it’s crucial to avoid meaningless predictions (e.g. healthcare, high precision manufacturing, autonomous driving
): safety in AI ‱ No prediction better than random guessing ‱ Need of some robustness metrics
  • 11. Robustness metrics: how should it be? A robustness metrics should tell us if the prediction is reliable or if it is some sort of random guessing. It should provide some measure of con fi dence in our prediction. ‱ Computable without labels ‱ Computable in “real-time” ‱ Easy to plug-in into a working ML pipeline ‱ Low false positive rate, but also low false negative rate ‱ High e ff ectiveness in detecting the anomaly points (low false negative rate) ‱ Low rate of discarded “good predictions” (low false positive rate)
  • 12. A few words about Robustness Robustness: the quality of an entity that doesn’t break too easily if stressed in some way. Canonical approaches in Statistics involve some modi fi cation of the predictive model/estimator (“If you get into a stressful situation, handle it better”), with often a loss of performance as side e ff ect. Tonight we focus on a di ff erent approach: we don’t modify our models but we protect them from threats (“Don’t get into stressful situations”). We look for the chickens and keep them away from our model. Fight-or- fl ight(-or-freeze) response: we choose to fl y!
  • 13. Lack of robustness is a risk There is an increasing attention in risks associated with AI applications. EU proposed an AI regulation that could become applicable to industry players starting from 2024 (https://digital- strategy.ec.europa.eu/en/policies/regulatory- framework-ai) The Commission is proposing the fi rst-ever legal framework on AI, which addresses the risks of AI and positions Europe to play a leading role globally. At some point, robustness will be a mandatory requirements of ML models in production, and lack of it will be considered a risk.
  • 14. Robustness metrics: two approaches ‱ GAN-based approach ‱ Decide if a prediction is worth the risk, checking the “stability” of the classi fi er for the new input data ‱ VAE-based approach ‱ Decide if a prediction is worth the risk, checking the true “origin” of the new input data
  • 16. GAN in a nutshell ‱ Generative Adversarial Networks ‱ Composed of two networks: ‱ Generator: generates inputs from random noise ‱ Discriminator: decides if the inputs from the Generator are authentic or arti fi cial ‱ After many iterations, the model learns to create inputs really close to the authentic ones
  • 17. A fancier GAN: WGAN ‱ Wasserstein Generative Adversarial Networks ‱ It use Wasserstein distance as GAN loss function ‱ Wasserstein Distance is a measure of the distance between two probability distributions ‱ Fore more info: https://lilianweng.github.io/posts/2017-08-20- gan/
  • 18. ‱ Train a Generator (from random noise to the input space) ‱ Train a Discriminator (it tries to understand if a point is true or generated by the Generator) ‱ The Generator learns how to fool the Discriminator ‱ Train an Inverter (from the input space to the latent representation) ‱ The Inverter learns how the Generator maps from the random noise to the input space Adding the Inverter
  • 19. Approach 1: overview and loss functions WGAN Loss Function Inverter Loss Function
  • 20. WGAN + Inverter: robustness metrics ‱ Given a new data point x, fi nd the closest point to x in the latent space that is, once translated back to the original space, able to confound the classi fi er f ‱ Robustness metrics: the distance ( delta_z = z* - I(x) ) between these two points, in the latent representation ‱ If delta_z is “small”, the classi fi er is not con fi dent about its prediction (the classi fi er “changes its mind” too easily) ‱ If delta_z is “big”, the classi fi er is con fi dent about the prediction (the classi fi er “knows what it wants”) Rule: if delta_z < threshold then do not predict
  • 21. WGAN + Inverter: Architecture ‱ PyTorch implementation ‱ Generator: 1 Fully Connected + 4 Transposed Conv. Layers (strides = 2), ReLu activation func. ‱ Discriminator (“Critic”): 4 Conv. Layers (strides = 2) + 1 Fully Connected, ReLu activation func. ‱ Inverter: 4 Conv. Layers (strides = 2) + 2 Fully Connected, ReLu activation func. ‱ Adam optimizers for the three nets
  • 22. WGAN + Inverter: Training and distribution of delta_z Distribution of delta_z computed on 500 test set input points
  • 23. WGAN + Inverter: Results Distribution of delta_z computed on 500 test set input points Delta_z for the chicken: 1.8e-06 (< 5.3e-02) Using the 5° percentile test delta_z as threshold, we could get a 5% as false positive rate and discard the meaningless predictions for the chicken picture. 5° percentile: 5.3e-02
  • 24. WGAN + Inverter: Results with Fashion MNIST Using the 5° percentile test delta_z as threshold, we could get a 5% as false positive rate but a 34.4% of lost good predictions (false negative rate)! Distribution of delta_z computed on 500 test set input points 5° percentile: 5.3e-02 We need to reduce the false negative rate!
  • 26. AutoEncoder in a nutshell ‱ VAE: Variational AutoEncoder ‱ Composed of two neural networks: ‱ Encoder: takes the inputs and “compress” them in a deep latent representation ‱ Decoder: takes the deep representation and creates back the original input as much as it can ‱ After many iterations, the model learns the best “latent representation” (kind of compression) of the training data
  • 27. From AE to Variational AutoEncoder Instead of mapping the input into a  fi xed vector, we map it into a distribution. In this way we learn the likelihood p(x|z), in this context called probabilistic decoder, that we can use to generate from the latent space. The encoding happens likewise: we learn the posterior p(z|x) in this context called probabilistic encoder. For more info: https://lilianweng.github.io/posts/2018-08-12-vae/
  • 28. Variational Autoencoder: robustness metrics The loss function of a variational autoencoder is an indirect measure of the probability that an observation comes from the same distribution that generated the training set. An “unlikely” input will have troubles in the encoding-decoding process = high loss value. Robustness metrics: loss function ‱ “Big” loss -> Not robust prediction: the new input data doesn’t come from the training set underlying distribution ‱ “Small” loss -> Robust prediction: the new input data comes from the training set underlying distribution Notice that with this approach we don’t need the classi fi er f for the robustness metrics computation! Rule: if VAE_loss > threshold then do not predict
  • 29. Variational Autoencoder: Architecture ‱ PyTorch implementation ‱ Encoder: 2 layers fully connected neural network ‱ Decoder: 2 layers fully connected neural network ‱ Adam optimizer
  • 30. Variational Autoencoder: Training and VAE loss distribution Distribution of VAE losses computed on 10.000 test set input points
  • 31. Variational Autoencoder: Results Distribution of VAE losses computed on 10.000 test set input points Using the maximum test loss as threshold, we could get a 0% as false positive rate and discard the meaningless predictions for the chicken picture. VAE loss for the chicken: 309.4 (>212)
  • 32. Variational Autoencoder: Results with Fashion MNIST Distribution of VAE losses computed on 10.000 test set input points Using the maximum test loss as threshold, we could get a 0% as false positive rate and a 3.55% of lost good predictions (false negative rate) Using the 95° percentile test loss as threshold, we could get a 5% as false positive rate and a 0.25% of lost good predictions (false negative rate)
  • 33. Variational Autoencoder: robustness metrics into production ‱ Train your classi fi er ‱ Train a VAE on your training set ‱ Get the distribution of the VAE losses on your test set ‱ De fi ne a threshold more or less “conservative” ‱ Implement a “conditional classi fi er”:
  • 35. Wrapping up: pros and cons of this approach Pros ‱ No need to modify your predictive models ‱ The same “monitoring” system can be used for di ff erent ML models (for a given dataset) ‱ Applicable to any kind of data (tabular, images, 
) ‱ “Easy” to explain ‱ Easy to plug-in into existing pipelines Cons ‱ Arbitrary thresholds must be set by the data scientist ‱ Introducing a further model that needs to be maintained
  • 36. Thank you for your attention! https://www.linkedin.com/in/davide-posillipo/