Interpreting complex machine learning models can be difficult. Given an interpretation, its meaningfulness and reliability are hard to evaluate. Even more, depending on the purpose (debugging, ...), a technique in the literature may be more appropriate than others. How to choose the best approach in the landscape of the existing techniques?
This talk is organized as a virtual "walk" through different techniques for interpreting machine learning, and particularly deep learning. Moving from the inside out, we will first cover techniques (such as gradient ascent and deconvolution) for interpreting the internal state of the model, namely its neurons, channels and layer activations. We will then focus on the model behavior from the outside. The model output, for instance, can be explained by attributing the final decisions to subsets of input pixels (as in saliency, occlusion and class activation maps) or to higher-level concepts, such as object size, scale and texture. Concept-based attribution, in particular, has been our research focus over the last years, allowing us to explain deep learning in simple terms to clinicians. For this, digital pathology and retinopathy were our main application domains. In addition, concept-based interpretability helped us explain internal CNN mechanisms such as the encoding of scale and memorization of input-label pairs.
5. 5
“Interpretability is defined as the ability to explain or
to present in understandable terms to a human*.”
* not all humans are familiar with Machine Learning
[Kim et al., 2018]
6. 6
“The goal of interpretability is to describe the internals of
a system in a way that is understandable to humans*.”
* not all humans are familiar with Machine Learning
[Giplin et al., 2019]
8. Interpretability as a human-centric “translation” problem
8
Explanation in the model
representation space
(input pixels, activations)
Explanation in the human
representation space
(a visualization, a concept, a
sentence, an important
factor)
[Kim et al., 2018]
10. Why do we need interpretablity?
10
Trained CNN
It’s a cat
11. Why do we need interpretablity?
11
Trained CNN
It’s a cat
YAY!
12. Why do we need interpretablity?
12
Trained CNN
It’s a cat
If you want to know more about networks easily fooled:
[Szegedy et al., 2013], [Nguyen et al., 2015],
[Papernot et al., 2016], [Moosavi-Dezfooli et al., 2017]
13. Why do we need interpretablity?
13
Trained CNN
It’s a cat
Oh…but…why?
If you want to know more about networks easily fooled:
[Szegedy et al., 2013], [Nguyen et al., 2015],
[Papernot et al., 2016], [Moosavi-Dezfooli et al., 2017]
14. Why do we need interpretablity?
14
Trained CNN
It’s a cat
…
If you want to know more about networks easily fooled:
[Szegedy et al., 2013], [Nguyen et al., 2015],
[Papernot et al., 2016], [Moosavi-Dezfooli et al., 2017]
15. Why questions
[Giplin et al., 2019]
Why is the model working?
Why is it not?
Why is the output like this?
Why is it not something else?
Why should we trust
the model?
Explain
Defend
actions
Gain trust …
develop better models!
Why do we need interpretablity?
15
17. HEALTH ROBOTICS ASSISTED DRIVING LAW SOCIAL SCIENCES
High-risk applications demanding also for accountability,
transparency, fairness, trust [FAccT conference]
FINANCE
Where do we need interpretability?
17
18. Where it is not needed
Why is it not needed?
One motivation does not cover it all …
privacy robustness
[Kim B, Hooker S.]
already well
studied problem
Where is it needed?
Why is it needed?
Safety, science, debugging.
aligning objectives
18
19. How do we achieve interpretablity?
19
Interpretability is challenging
and trending
21. Our goal today is the how
21
gain a clearer understanding
KNOWING WHAT TO
APPLY & WHERE
22. 22
1. Inherently interpretable models
a. (Generalized) linear regression
b. Decision trees and rules
2. Interpreting complex models
a. From inside (opening the black box)
b. From outside (black-box)
3. What else? Use interpretability to develop
better models.
4. Q&A
Outline
Inside
Out
I will not talk about
dimensionality reduction
24. 24
Linear Regression
Output is a weighted sum of each feature
A linear increase of a feature is translated into a proportional effected in the outcome
No interactions between features
25. 25
Generalized Linear Regression
Family: normal
Link: identity
Family: binomial
Link: logit
Interpretation comes mostly from assumption on the data generation process
Complexity ~ generalization
We can replace with arbitrary distributions according to the data generation process
[Caruana et al. 2015]Generalized Additive Models
26. 26
Decision trees and rules
Car’s model
Ah, it’s a Tesla
Autonomy >= 350?
Yes
Is it electric?
Yes
Trackable and explainable decisions.
Good for data interactions!
Only categorical features
Step functions…sharp!
Changes in data lead to
different tree
Complexity ~ depth [Kim B.]
27. 27
Decision trees and rules
IF
PERSON CAPACITY < 2 &&
PRICE = ‘high’ &&
ELECTRIC = False &&
COMPANY LOGO = ‘horse’
THEN
car is a FERRARI
Intrinsic explanation
Sparse
Efficient
Not for regression
Only categorical features
Complexity ~ #rules
[Kim B.]
29. Local Global
Model-specific Post-hoc
29
Helpful terminology
vs
= true for a specific instance
= true for an entire set of inputs
(ex. a class)
= model built-in analysis = applicable to any model
[Lipton, 2016]
30. InsideMMD [Kim et al., 2016].
IF [Koh et al., 2017]
Database search
Geometric approaches
Visualization
Deconv [Zeiler et al., 2013],
AM [Erhan et al., 2009],
Dissection [Bau et al., 2017]
SVCCA [Raghu et al., 2017]
Surrogates
2. Interpretability of deep learning inside out
30
Out Attribution…
To features
To concepts
LIME [Ribeiro et al., 2016].
SHAP [Lundberg et al., 2016]
Saliency [Simonyan et al., 2013],
CAM [Zhou et al., 2016],
LRP [Binder et al., 2016]
TCAV [Kim et al., 2018],
RCVs [Graziani et al., 2018]
32. Database search
32
What examples explain the data or the model?
Prototype
= representative of all data
Criticism
= under-represented bit
Maximum Mean
Discrepancy
[Kim et al., 2016]
Global
Post-hoc
33. Influential Instances
33
Deleting one of these would strongly affect learning
Influence Functions
[Koh et al., 2017]
Best paper award!
Post-hoc
Global
35. 35
What is a neuron, a channel or a layer looking for?
Deconvolution:
inverting convolution
operations
[Zeiler et al., 2013]
Figure credits: stanford CS230 (2018 Youtube)
Post-hoc
Local Deconvolutions
36. 36
What is a neuron, a channel or a layer looking for?
Gradient Ascent
[Erhan et al., 2009],
[Olah et al. 2019]
This activation is maximized
Lucid toolbox
Post-hoc
Global Gradient Ascent
37. Network Dissection
37
What is a neuron, a channel or a layer looking for?
Network Dissection
[Bau et al., 2017]
~1K concepts:
Post-hoc
Global
Early training finds concepts, late training improves them
color, texture,
material, object, scene
Set of segmented
regions for all
concepts
38. 38
Singular Vector Canonical Correlation Analysis
Can we compress what a layer has learned? [Raghu et al., 2017]
Responses of this
layer to all data
Singular Value Decomposition
& Canonical Correlation Analysis
Allows comparisons of layers, archiectures and insights on training dynamics
What did it look like and what can we
do here?
Post-hoc
Global
39. InsideMMD [Kim et al., 2016].
IF [Koh et al., 2017]
Database search
Geometric approaches
Visualization
Deconv [Zeiler et al., 2013],
AM [Erhan et al., 2009],
Dissection [Bau et al., 2017]
SVCCA [Raghu et al., 2017]
Surrogates
2. Interpretability of deep learning inside out
39
Out Attribution…
To features
To concepts
LIME [Ribeiro et al., 2016].
SHAP [Lundberg et al., 2016]
Saliency [Simonyan et al., 2013],
CAM [Zhou et al., 2016],
LRP [Binder et al., 2016]
TCAV [Kim et al., 2018],
RCVs [Graziani et al., 2018]
40. Surrogate models
40
Replacement is an interpretable model trained on the data and the black-box predictions
Complex decision function
Linear surrogate (R2 0.7)
Flexibility by different surrogates
Very approximative ….
Post-hocGlobal
41. Local Interpretable Model-agnostic Explanations (LIME)
41
Replacement is an interpretable model trained on the data and the black-box predictions
to explain each prediction individually
Local linear surrogate
Flexible, universal (post-hoc)
Size of local neighborhood undefined
[Ribeiro et al., 2017]
42. Local Interpretable Model-agnostic Explanations (LIME)
42
Replacement is an interpretable model trained on the data and the black-box predictions
to explain each prediction individually
Local linear surrogate
Flexible, universal (post-hoc)
Size of local neighborhood undefined
[Ribeiro et al., 2017]
43. Local Interpretable Model-agnostic Explanations (LIME)
43
Replacement is an interpretable model trained on the data and the black-box predictions
to explain each prediction individually
Local linear surrogate
Flexible, universal (post-hoc)
Size of local neighborhood undefined
[Ribeiro et al., 2017]Sampling of local instances not very robust
44. Local Interpretable Model-agnostic Explanations (LIME)
44
Replacement is an interpretable model trained on the data and the black-box predictions
to explain each prediction individually
Local linear surrogate
Flexible, universal (post-hoc)
Size of local neighborhood undefined
Sampling of local instances not very robust
45. Local Interpretable Model-agnostic Explanations (LIME)
45
Replacement is an interpretable model trained on the data and the black-box predictions
to explain each prediction individually
Local linear surrogate
Flexible, universal (post-hoc)
Size of local neighborhood undefined
[Ribeiro et al., 2017]Sampling of local instances not very robust
46. SHapley Additive exPlanations
46
A game theoretic approach of competing features
[Lundberg et al., 2017]
Attributes to each input feature the change in the expected model prediction when conditioning on that feature.
Unifying framework, direct for categorical features
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint.http://dx.doi.org/10.1101/206540doi:bioRxiv preprint first posted online Oct. 21, 2017;
Only qualitative on images, difficult abstraction
49. Saliency**
49 Slide credits: Hooker S,
** Saliency was recently shown to be very unstable and failing to be reliable in edge cases such
as in randomized networks or compared to random attribution maps (see Remove and Retrain, ROAR)
50. Class Activation Mapping
50
Importance of the image
regions is given by the
projection of the weights
of the output layer on the
last layer’s convolutional
feature maps
[Zhou et al., 2015]
Greatly succesfull technique for its transparency and directness
Only qualitative evaluation
Little focus on multiple instances of the same object
Post-hocLocal
52. Concept Activation Vectors
52
?
Vector of “striped” texture
∂output
∂vector
Directional
derivative
[Kim et al., 2018]
We collect examples of a
concept, i.e. “striped
texture”
We take the internal
activations (unrolled)
Classification of
“striped” vs
“random”
What about the relevance of “striped” texture in the classification of a zebra?
Generalized
saliency
Post-hocLocal Global
53. 53
Out
What if the concept is non-binary?
Such as tumor extension, patient age, color, ..
55. Regression Concept Vectors
55
Segmentation
(manual or
automatic)
Handcrafted
features, texture
descriptors, shape,
size, …
Take the internal
activations (aggregation)
Linear regression of
measures
∂output
∂vector
Directional
derivative
Generalized
saliency
[Graziani et al., 2018]
Best paper award, iMIMIC, MICCAI 2018!
Vector of “size”
Post-hocLocal Global
56. Application to health
56
1 Modeling of visual concepts
2 CNN explanation
Nuclei
Pleomorphism
Tubular formation
Mitotic count
Enlarged nuclei
Vesicular
appearance
Multiple nucleoli
Segmentation
size
Image
texture
descriptors
contrast
ASM
correlation
area
Nottingham
grading
guidelines
relevant for
positive class
image
tumor
probability
contrast
area
ASM
correlation
relevant for
negative class
black-box
state-of-art
model
contrast
ASM
correlation
area
high
low
57. Application to health
57
curvature mean
raw segmented
Individual relevancepn = 0.22
ppre = 0.70
pplus = 0.08
GT: normal; prediction: normal
cti median
cti meancurvature median
avg point diameter mean avg segment diameter median
raw segmented
pn = 0.99
ppre = 0.009
pplus = 0.0
1.082
1.168
0.118
0.447
5.24
-1 10
1.030
1.045
0.040
0.095
3.775
Retinopathy of prematurity
[Graziani et al., 2019]
[Yeche et al., 2019]
Radiomics
Image credits: Yeche et al. springer
59. Applications to computer vision
59
Image blue-ness
Color and texture were used to
reduce this loss!
[Graziani et al., 2019]
Interpreting intentionally flawed models
60. Our goal today is the how
60
gain a clearer understanding
WHAT TO
APPLY & WHERE?
61. 61
WHAT TO APPLY & WHERE?
What do you need most?
In deep learning
Understand
each
component
User-friendly
explanations
Visual explanations on
single input
Global
understanding
Individually
Interactions
Make
comparisons
Surrogate
models
From dataset of
conceptual examples
Measuring attributes on
imagesRCVs CAVs
Gradient
Ascent
Geometry-
based
Before decision
layer
Class
Activation
Maps
Layerwise
Layerwise
Relevance
64. 64
Can we use interpretability for better control and development?
It’s a cat
It has pointy ears…
And mustache!
Cats DO NOT have penguin legs….
Robustness to
adversarial
65. 65
Can we use interpretability for better control and development?
Interpretability
analysis
Prior knowledge
User’s feedback
Additional targets for
our model output
Desired
features
Multi-task
Learning
Undesired
Features
Adversarial
Learning
66. 66
TAKE AWAY
Pic credits: bannerengineering.com
Cartography to navigate
interpretability (slide 121)
Growing conferences, workshops:
Some interesting people and projects:
FAccT, Tutorial on Interpretable Machine Learning,
NeurIPS Interpretable ML, ICML Interpretable ML,
DL summer schools, AISTATS, CVPR, ICLR, ECCV,
ICCV, ECML, KDD, Workshop on Intepreting and
Explaining Visual AI Models, Tutorial on
Interpretable & transporent deep learning, WHI 2020
(virtual this year)… and many others!
B. Zhou (Torralba, MIT), B. Kim (Google Brain), G. Montavon (heatmapping.org), DARPA’s XAI, Ruth C.
Fong (Harvard & Microsoft), Finale Doshi-Velez (Harvard), A. Weller (Cambridge), S. Lundberg
(Microsoft), DARPA’s XAI Explainable Artificial Intelligence…and many others!
ML interpretability is human-centric, multi-faceted and should be tailored on a precise scope.
67. Thank you!
67
linear probing interpretationinternal representation
classification [1] regression (ours)
TCAV [1]
Br (ours [3,4,5])
UBS [2]
post-hoc interpretability
Text
DL model
feedback
through
concepts
Modeling of prior knowledge
area
contrast
Update of objective function
Handcrafted ML features
ML interpretability for healthcare: our vision
mara.graziani@hevs.ch
@mormontre
@maragraziani