Understanding Black-box Predictions via Influence Functions (2017)

Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry T. Um
UNDERSTANDING BLACK-BOX PRED
-ICTION VIA INFLUENCE FUNCTIONS
1

TODAY’S PAPER
ICML2017 best paper
https://youtu.be/0w9fLX_T6tY

QUESTIONS
• How can we explain the predictions of a black-box model?
• Why did the system make this prediction?
• How can we explain where the model came from?
• What would happen if the values of a training point where
slightly changed?

INTERPRETATION OF DL RESULTS
• Retrieving images that maximally activate a neuron [Girshick et al. 2014]
• Finding the most influential part from the image [Zhou et al. 2016]
• Learning a simpler model around a test point [Ribeiro et al. 2016]
But, they assumed a
fixed model
 My NN is a function
of training inputs

INFLUENCE OF A TRAINING POINT
• What is the influence of a training example for
the model (or for the loss of a test example)?
Optimal model param. :
Model param. by training w/o z :
Model param. by upweighting z :
without z == (𝜖 = −
1
𝑛
)
• The influence of upweighting z on the parameters 𝜃

• Influence vs. Euclidean distance

• The influence of upweighting z on the loss at a test point

PERTURBING A TRAINING POINT
• Move 𝜖 mass from 𝑧 to 𝑧 𝛿
• If x is continuous and 𝛿 is small
• The effect of 𝑧  𝑧 𝛿 on the loss at a test point

SUMMARY
• The influence of 𝑧  𝑧 𝛿 on the loss at a test point
• The influence of upweighting z on the loss at a test point

EXAMPLE
• The influence of upweighting z
• In logistic regression,
• Test : 7, Train : 7 (green), 1 (red)

SEVERAL PROBLEMS
• Calculation of
 Use Hessian-vector products (HVPs)

precompute 𝑠𝑡𝑒𝑠𝑡 by optimizing
or sampling-based approximation

SEVERAL PROBLEMS
• What if is non-convex, so H < 0
 Assuming that is a local minimum point, define a quadratic loss
Then calculate using the above
 empirically working!
• Influence function vs. retraining

SEVERAL PROBLEMS
• What if is non-differentiable?
e.g.) hinge loss
 Use a differentiable variation of the hinge loss

APPLICATIONS
• Understanding model behavior

APPLICATIONS
• Adversarial examples
c.f.) The effect of 𝑧  𝑧 𝛿 on the loss at a test point

APPLICATIONS
• Debugging domain mismatch

APPLICATIONS
• Fixing mislabeled examples

Understanding Black-box Predictions via Influence Functions (2017)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (8)

Destaque

Destaque (20)

Mais de Terry Taewoong Um

Mais de Terry Taewoong Um (6)

Último

Último (20)

Understanding Black-box Predictions via Influence Functions (2017)