CounterFactual Explanations.pdf

BH Lee
Counterfactual Explanation
Interpretable Machine Learning

Counterfactual Explanations
• A counterfactual explanation describes a causal situation in the form: “If X had
not occurred, Y would not have occurred”

• In interpretable machine learning, counterfactual explanations can be used to
explain predictions of individual instances

• A counterfactual explanation of a prediction describes the smallest change to
the feature values that changes the prediction to a prede
fi
ned output.

• Counterfactuals are human-friendly explanations, because they are
contrastive to the current instance and because they are selective, meaning they
usually focus on a small number of feature changes. But counterfactuals su
ff
er
from the ‘Rashomon e
ff
ect’.

What Is a Good Explanation?
• Explanations are contrastive. → Humans do not want a complete explanation for a
prediction, but want to compare what the di
ff
erences were to another instance’s
prediction.

• Explanations are selected. → Make the explanation very short, give only 1 to 3
reasons, even if the world is more complex.

• Explanations are social. → Pay attention to the social environment of your machine
learning application and the target audience.

• Explanations focus on the abnormal. If one of the input features for a prediction was
abnormal in any sense (like a rare category of a categorical feature) and the feature
in
fl
uenced the prediction, it should be included in an explanation, even if other
‘normal’ features have the same in
fl
uence on the prediction as the abnormal one.
https://brunch.co.kr/@bdh/33

What is a good counterfactual explanation?
• A counterfactual instance produces the prede
fi
ned prediction as closely as
possible.

• A counterfactual should be as similar as possible to the instance regarding
feature values.

• Multiple diverse counterfactual explanation

• A counterfactual instance should have feature values that are likely.

What is a good counterfactual explanation?

Generating Counterfactual Explanations
Method by Wachter et al
• Objective Function: ,

• A higher value of λ means that we prefer counterfactuals with predictions close to the desired
outcome y’

• Manhattan distance weighted with the inverse median absolute deviation (MAD) of each feature

•
Total distance is the sum of all p feature-wise distances:

• → It is the equivalent of the variance of
a feature(more robust to outliers than Euclidian distance)

• Instead of λ, the author suggest to use
arg min
x′

max
λ
L(x, x′

, y′

, λ) L(x, x′

, y′

, λ) = λ ⋅ ( ̂
f(x′

) − y′

)2
+ d(x, x′

)
d(x, x′

) =
p
∑
j=1
|xj − x′

j |
MADj
MADj = mediani∈{1,…,n}(|xi,j − medianl∈{1,…,n}(xl,j)|)
ϵ(Tolerance) : | ̂
f(x′

) − y′

| ≤ ϵ

Generating Counterfactual Explanations(Cont.d)
• Process

• Select an instance x to be explained, the desired outcome y’, a tolerance ϵ and a (low) initial value for λ

• Sample a random instance as initial counterfactual.

• Optimize the loss with the initially sampled counterfactual as starting point.

• While

• Increase

• Optimize the loss with the current counterfactual as starting point.

• Return the counterfactual that minimizes the loss.

• Repeat steps 2-4 and return the list of counterfactuals or the one that minimizes the loss.
| ̂
f(x′

) − y′

| > ϵ
λ

• It only takes the
fi
rst and second criteria into account not the last two
(“produce counterfactuals with only a few feature changes and likely feature
values”).

• The method does not handle categorical features with many di
ff
erent levels
well.

• The authors of the method suggested running the method separately for each
combination of feature values of the categorical features → High computation
cost

Method by Dandl et al
• Loss Function:

• Multi Objective Function:

•
, ,  
,

•
Gowel’s Distance:

• : The observed value range, scales for all features is between 0 and 1
L(x, x′

, y′

, Xobs
) = (o1( ̂
f(x′

), y′

), o2(x, x′

), o3(x, x′

), o4(x′

, Xobs
))
o1( ^
f(x′

), y′

) =
0 if
^
f(x′

) ∈ y′

inf
y′

∈y′

| ^
f(x′

) − y′

| else
o2(x, x′

) =
1
p
p
∑
j=1
δG(xj, x′

j)
o3(x, x′

) = ||x − x′

||0 =
p
∑
j=1
𝕀
x′

j≠xj
o4(x′

, Xobs) =
1
p
p
∑
j=1
δG(x′

j, x[1]
j
)
δG(xj, x′

j) =
1
^
Rj
|xj − x′

j | if xj numerical
𝕀
xj≠x′

j
if xj categorical
̂
R j δG

• NSGA-II: A method for solving multi-objective optimization problems by
fi
nding multiple Pareto solutions. (Especially,
using Nondominated Sorting, Crowding Distance)

• In the
fi
rst generation a group of counterfactual candidates is initialized by randomly changing some of the features
compared to our instance x to be explained.

• a candidate is then evaluated using the four objective functions of above. Among them, we randomly select some
candidates, where
fi
tter candidates are more likely to be selected.

• The nondominated sorting algorithm sorts the candidates according to their objective values. If candidates are
equally good, the crowding distance sorting algorithm sorts the candidates according to their diversity.
• A가 B에 대해서 모든 평가척도에서 우위를 가질 때 B is dominated 했다고 표현하고, Dominating Set을 파레토 최적해 집합이라
고 하고 파레토 최적해가 이루는 경계선을 Pareto front라고 한다. Pareto front와 거리가 가까을 수록 높은 순위가 부여됨

• Given the ranking of the two sorting algorithms, we select the most promising and/or most diverse half of the
candidates. We use this set for the next generation and start again with the selection, recombination and mutation
process.

• Evaluation Metric: HyperVolume Parameter

Example
• Support vector machine (with radial basis kernel) to predict the probability
that a customer has a good credit risk.

• The goal is to
fi
nd counterfactual explanations for a customer with the
following feature values:

• The SVM predicts that the woman has a good credit risk with a probability of
24.2 %. The counterfactuals should answer how the input features need
to be changed to get a predicted probability larger than 50 %?

Example (Cont.d)
• The
fi
rst
fi
ve columns contain the proposed feature changes (only altered features are
displayed), the next three columns show the objective values

• All counterfactuals have predicted probabilities greater than 50 % and do not dominate each
other. Non-dominated means that none of the counterfactuals has smaller values in all objectives
than the other counterfactuals.

Advantages
• The interpretation of counterfactual explanations is very clear. If the feature
values of an instance are changed according to the counterfactual, the
prediction changes to the prede
fi
ned prediction.

• The counterfactual method does not require access to the data or the
model. It only requires access to the model’s prediction function, which
would also work via a web API, for example.

• The method works also with systems that do not use machine learning.

• The counterfactual explanation method is relatively easy to implement.

Disadvantages
• For each instance you will usually
fi
nd multiple counterfactual explanations
(Rashomon e
ff
ect).

Bonus
Model Speci
fi
c CFE vs Model Agonistic CFE

• NSGA-II: A method for solving multi-objective optimization problems by
fi
nding multiple Pareto solutions. (Especially, using Nondominated Sorting, Crowding Distance)

• A가 B에 대해서 모든 평가척도에서 우위를 가질 때 B is dominated 했다고 표현하고, Dominating Set을 파레토 최적해 집합이라고 하고 파레토 최적해들이 이루는 경계선을 파레토 경계
라고 한다. 다목적 최적화 알고리즘은 이 파레토 경계와 가까운 것을 기준으로 해들의 우수성을 평가한다.

• Nondominated Sorting Algorithm: 파레토 경계와 거리가 가까을 수록 높은 순위가 부여됨 → 지배되지 않은 해의 집합이 우선순위를 가지게 된다.

• Crowding Distance Sorting Algorithm: 유사하지 않을 수록 높은 Crowding Distance를 가짐 → 다양성에 높은 점수를 부여

• In the
fi
rst generation a group of counterfactual candidates is initialized by randomly changing some of the features compared to our instance x to be explained.

CounterFactual Explanations.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a CounterFactual Explanations.pdf

Semelhante a CounterFactual Explanations.pdf (20)

Último

Último (20)

CounterFactual Explanations.pdf