Fairness in Machine Learning @Codemotion

Fairness in Machine Learning: are you
sure there is no bias in your
predictions?
Azzurra Ragone - Innovation Manager
@azzurraragone

Me…
Innovation Manager
Previous @Google DevRel team
Before Research fellow:
➢ Univ. Milano Bicocca,
➢ University of Michigan
➢ Politecnico of Bari
➢ University of Trento

People worry that computers will get too
smart and take over the world, but the
real problem is that they’re too stupid and
they’ve already taken over the world
The Master Algorithm
Pedro Domingos, 2015

How to make my ML system fair?
...and why care?

Our success, happiness and
wellbeing can be affected by other
decisions

Life-changing decisions:
➔ Admission to schools
➔ Job offers
➔ Patients screenings
➔ Mortgage grant
➔ ...

Arbitrary, inconsistent, or faulty decision-making thus
raises serious concerns because it risks limiting our
ability to achieve the goals that we have set for ourselves
and access the opportunities for which we are qualified.
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan

How do we ensure that these decisions are
made the right way and for the right reasons?
Fairness and Machine Learning

The ML promise:
make decisions more consistent,
accurate and rigorous.

B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman.
Object Recognition by Scene Alignment.
Advances in Neural Information Processing Systems, 2007.

...but there are serious risks in learning
from examples.

Generalizing from examples
Source: https://design.google/library/fair-not-default/
Quick, Draw!

Generalizing from examples
Provide good examples:
- a sufficiently large and diverse set
- well annotated
Quick, Draw!
Source: https://design.google/library/fair-not-default/

Historical examples may reflect:
- Prejudices against a social group
- Cultural stereotypes
- Demographic inequalities
and finding patterns in these data means replicating these
same dynamics

Source: https://gluon-cv.mxnet.io/build/examples_datasets/imagenet.html

45% of ImageNet data comes from USA (4% of the world population)
3% of ImageNet data comes from China and India (36% of the world population)
Ref: Nature 559 and Shankar, S. et al. (2017)
Geo bias

Photo Credit: Left: iStock/Getty; Right: Prakash Singh/AFP/Getty (from Nature 559, 324-326 (2018))
Bride
Dress
Woman
Wedding
Performance
art
Costume

Debiasing Word Embeddings
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016).
Credit: Pictures by Pixabay

State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
Source: Fairness and Machine Learning

State of the world
Data
Measurement

Provenance of data
is crucial.
Data cleaning is
mandatory.
The world is “messy”
Photo by pasja1000 on Pixabay

Measurement defines:
- your variables of interest,
- the process for turning your
observations into numbers,
- how you actually collect the
data
[Fairness and Machine Learning, 2018]
Photo by Iker Urteaga on Unsplash

The target variable is the
hardest to measure.
It is made up for the purpose
of the problem.
It is not a property that
people possess or lack
Ex. “creditworthiness”, “good
employee”, “attractiveness”
[Fairness and Machine Learning, 2018]
Photo by David Paschke on Unsplash

State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback

ML will extract
stereotypes the same
way that it extracts
knowledge

labor statistics and the
male-as-norm bias
almost perfectly predict
which pronoun will be
returned
[Caliskan et al., 2017]

ML works better with more data, so it will work less well for
members of minority groups
Sample size disparity
Training set
Training data

It’s not always about “Prediction” (“is
this patient at high risk for
cancer?”).
It can be classification (determine
whether a piece of email is spam),
regression (assigning risk
scores to defendants), or information
retrieval (finding documents
that best match a search query).
Photo by Tobias Zils on Unsplash

Predictions - actions - outcome
Photo by Pixabay

If you predict future prices (and publicizes them) you create a self-fulfilling
feedback loop: houses with a lower sales prices predicted deter buyers,
demand goes down and the final price is even lower
House price prediction
PhotobyDevaDarshanonUnsplash

Some communities may be disproportionately targeted, with people being
arrested for crimes that might be ignored in other communities.
Ref.: Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016).
Self-fulfilling predictions
PhotobyJacquesTiberionPixabay

“Feedback loops occur when data discovered on the
basis of predictions are used to update the model.”
Danielle Ensign et al.,
“Runaway Feedback Loops in Predictive Policing,” 2017

Training data encode the demographic disparities in our society and
some stereotypes can be reinforced by ML (due to feedback loop)
The state of society
PhotobyCorySchadtonUnsplash

Analyze your data
Source: Google Machine Learning Crash Course
★ Are there missing feature values for a large number of observations?
★ Are there features that are missing that might affect other features?
★ Are there any unexpected feature values?
★ What signs of data skew do you see?

Missing feature values
Source: California Housing dataset,
Google Machine Learning Crash Course

Skew data (geographical bias)
Source: California Housing dataset,
Google Machine Learning Crash Course

Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
Facets Overview, an
interactive
visualization tool to
explore datasets.
Quickly analyze the
distribution of
values across the
datasets.

Facets Overview
Source: Facet tool
⅔ of examples
represent males,
while we would
expect the
breakdown
between
genders to be
closer to 50/50

Facets Dive
Source: Facet tool
Data are faceted by
marital-status
feature. Male
outnumbers female
by more than 5:1.
Married women are
underrepresented in
our data.

Evaluating for Bias
Model to predict the presence of tumors evaluated
against a validation set of 1,000 patients.
500 records from female patients
500 records from male patients.

Evaluating for Bias
the model incorrectly predicts tumor in 9.1%
the model misses a tumor diagnosis in 9.1%
the model incorrectly predicts tumor in 33.3%
the model misses a tumor diagnosis in 45.5%

“What-if” tool
Analyze ML model
without writing code.
Given pointers to a
TF model and a
dataset, the What-If
Tool offers an
interactive visual
interface for
exploring model
results.

Counterfactuals
It is possible to
compare a datapoint
to the most similar
point where your
model predicts a
different result.

Counterfactuals
a minor difference in
age and an
occupation change
flipped the model’s
prediction (earning
>50K)

Visualize inference results
Compare the
performance of two
models, or inspect a
single model's
performance by
organizing inference
results into confusion
matrices, scatterplots or
histograms.

Edit a datapoint
Edit a datapoint and see
how your model performs.
Edit, add or remove
features or feature values
for any selected datapoint
and then run inference to
test model performance.

Test algorithmic fairness
Slice your dataset into
subgroups and explore the
effect of different
algorithmic fairness
constraints
See: “Playing with fairness”
by David Weinberger.

★ Measurement is crucial
★ Know your data (and how data were collected and annotated)
★ Try to discover hidden biases (missing values, data skew, subgroups, etc.)
★ Ask questions. Don’t train the model and then walk away
★ Avoid feedback loop
★ Use tools that allow you to do such investigation
Key Takeaways

AI is a cultural shift as much as a technical one.
Autonomous systems are changing workplaces, streets
and schools.
We need to ensure that those changes are beneficial,
before they are built further into the infrastructure of every
day life.
There is a blind spot in AI research
Kate Crawford& Ryan Calo
Nature 538, 311–313 (20 October 2016)

❏ AI can be sexist and racist — it’s time to make it fair James Zou &
Londa Schiebinger - Nature 559, 324-326 (2018)
❏ The Master Algorithm Pedro Domingos, 2015
❏ Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
❏ No Classification without Representation: Assessing Geodiversity
Issues in Open Data Sets for the Developing World Shreya Shankar,
Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley
❏ Man is to computer programmer as woman is to homemaker?
Debiasing word embeddings T. Bolukbasi, K.-W. Chang, J. Y. Zou, V.
Saligrama, A. T. Kalai,. Adv. Neural Inf. Process. Syst. 2016,
4349–4357 (2016)
References

❏ There is a blind spot in AI research, Kate Crawford & Ryan Calo,
Nature 538, 311–313 (20 October 2016)
❏ Semantics Derived Automatically from Language Corpora Contain
Human-Like Biases, Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan, Science 356, no. 6334 (2017): 183–86
❏ Predictions Put Into Practice: a Quasi-experimental Evaluation of
Chicago's Predictive Policing Pilot Saunders, J., Hunt, P. & Hollywood,
J. S. J. Exp. Criminol. 12, 347–371 (2016).
❏ Runaway Feedback Loops in Predictive Policing Danielle Ensign et al.
arXiv:1706.09847
References

❏ Object Recognition by Scene Alignment. B. C. Russell, A. Torralba, C.
Liu, R. Fergus, W. T. Freeman. Advances in Neural Information
Processing Systems, 2007.
❏ Fair Is Not the Default (https://design.google/library/fair-not-default/)
❏ “Playing with fairness” - David Weinberger.
❏ Google Machine Learning Crash Course
❏ What-if tool: https://pair-code.github.io/what-if-tool/
❏ Facet tool https://pair-code.github.io/facets/
References

Fairness in Machine Learning @Codemotion

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Fairness in Machine Learning @Codemotion

Semelhante a Fairness in Machine Learning @Codemotion (20)

Último

Último (20)

Fairness in Machine Learning @Codemotion