AI should be Fair, Accountable and Transparent (FAT* AI), hence it's crucial to raise awareness among these topics not only among machine learning practitioners but among the entire population, as ML systems can take life-changing decisions and influence our lives now more than ever.
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Fairness in Machine Learning @Codemotion
1. Fairness in Machine Learning: are you
sure there is no bias in your
predictions?
Azzurra Ragone - Innovation Manager
@azzurraragone
2. Me…
Innovation Manager
Previous @Google DevRel team
Before Research fellow:
➢ Univ. Milano Bicocca,
➢ University of Michigan
➢ Politecnico of Bari
➢ University of Trento
3. People worry that computers will get too
smart and take over the world, but the
real problem is that they’re too stupid and
they’ve already taken over the world
The Master Algorithm
Pedro Domingos, 2015
4. How to make my ML system fair?
...and why care?
7. Arbitrary, inconsistent, or faulty decision-making thus
raises serious concerns because it risks limiting our
ability to achieve the goals that we have set for ourselves
and access the opportunities for which we are qualified.
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
8. How do we ensure that these decisions are
made the right way and for the right reasons?
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
10. B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman.
Object Recognition by Scene Alignment.
Advances in Neural Information Processing Systems, 2007.
13. Generalizing from examples
Provide good examples:
- a sufficiently large and diverse set
- well annotated
Quick, Draw!
Source: https://design.google/library/fair-not-default/
14. Historical examples may reflect:
- Prejudices against a social group
- Cultural stereotypes
- Demographic inequalities
and finding patterns in these data means replicating these
same dynamics
16. 45% of ImageNet data comes from USA (4% of the world population)
3% of ImageNet data comes from China and India (36% of the world population)
Ref: Nature 559 and Shankar, S. et al. (2017)
Geo bias
19. Debiasing Word Embeddings
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016).
Credit: Pictures by Pixabay
20. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
Source: Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
21. State of the world
Data
Measurement
The Machine Learning Loop
22. Provenance of data
is crucial.
Data cleaning is
mandatory.
The world is “messy”
Photo by pasja1000 on Pixabay
23. Measurement defines:
- your variables of interest,
- the process for turning your
observations into numbers,
- how you actually collect the
data
[Fairness and Machine Learning, 2018]
Photo by Iker Urteaga on Unsplash
24. The target variable is the
hardest to measure.
It is made up for the purpose
of the problem.
It is not a property that
people possess or lack
Ex. “creditworthiness”, “good
employee”, “attractiveness”
[Fairness and Machine Learning, 2018]
Photo by David Paschke on Unsplash
25. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
27. labor statistics and the
male-as-norm bias
almost perfectly predict
which pronoun will be
returned
[Caliskan et al., 2017]
28. ML works better with more data, so it will work less well for
members of minority groups
Sample size disparity
Training set
Training data
29. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
30. It’s not always about “Prediction” (“is
this patient at high risk for
cancer?”).
It can be classification (determine
whether a piece of email is spam),
regression (assigning risk
scores to defendants), or information
retrieval (finding documents
that best match a search query).
Photo by Tobias Zils on Unsplash
32. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
33. If you predict future prices (and publicizes them) you create a self-fulfilling
feedback loop: houses with a lower sales prices predicted deter buyers,
demand goes down and the final price is even lower
House price prediction
PhotobyDevaDarshanonUnsplash
34. Some communities may be disproportionately targeted, with people being
arrested for crimes that might be ignored in other communities.
Ref.: Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016).
Self-fulfilling predictions
PhotobyJacquesTiberionPixabay
35. “Feedback loops occur when data discovered on the
basis of predictions are used to update the model.”
Danielle Ensign et al.,
“Runaway Feedback Loops in Predictive Policing,” 2017
36. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
37. Training data encode the demographic disparities in our society and
some stereotypes can be reinforced by ML (due to feedback loop)
The state of society
PhotobyCorySchadtonUnsplash
40. Analyze your data
Source: Google Machine Learning Crash Course
★ Are there missing feature values for a large number of observations?
★ Are there features that are missing that might affect other features?
★ Are there any unexpected feature values?
★ What signs of data skew do you see?
42. Skew data (geographical bias)
Source: California Housing dataset,
Google Machine Learning Crash Course
43. Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
Facets Overview, an
interactive
visualization tool to
explore datasets.
Quickly analyze the
distribution of
values across the
datasets.
44. Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
⅔ of examples
represent males,
while we would
expect the
breakdown
between
genders to be
closer to 50/50
45. Facets Dive
Source: Facet tool
(https://pair-code.github.io/facets/)
Data are faceted by
marital-status
feature. Male
outnumbers female
by more than 5:1.
Married women are
underrepresented in
our data.
46. Evaluating for Bias
Source: Google Machine Learning Crash Course
Model to predict the presence of tumors evaluated
against a validation set of 1,000 patients.
500 records from female patients
500 records from male patients.
47. Evaluating for Bias
Source: Google Machine Learning Crash Course
the model incorrectly predicts tumor in 9.1%
the model misses a tumor diagnosis in 9.1%
the model incorrectly predicts tumor in 33.3%
the model misses a tumor diagnosis in 45.5%
48. “What-if” tool
Analyze ML model
without writing code.
Given pointers to a
TF model and a
dataset, the What-If
Tool offers an
interactive visual
interface for
exploring model
results.
49. Counterfactuals
It is possible to
compare a datapoint
to the most similar
point where your
model predicts a
different result.
51. Visualize inference results
Compare the
performance of two
models, or inspect a
single model's
performance by
organizing inference
results into confusion
matrices, scatterplots or
histograms.
52. Edit a datapoint
Edit a datapoint and see
how your model performs.
Edit, add or remove
features or feature values
for any selected datapoint
and then run inference to
test model performance.
53. Test algorithmic fairness
Slice your dataset into
subgroups and explore the
effect of different
algorithmic fairness
constraints
See: “Playing with fairness”
by David Weinberger.
54. ★ Measurement is crucial
★ Know your data (and how data were collected and annotated)
★ Try to discover hidden biases (missing values, data skew, subgroups, etc.)
★ Ask questions. Don’t train the model and then walk away
★ Avoid feedback loop
★ Use tools that allow you to do such investigation
Key Takeaways
55. AI is a cultural shift as much as a technical one.
Autonomous systems are changing workplaces, streets
and schools.
We need to ensure that those changes are beneficial,
before they are built further into the infrastructure of every
day life.
There is a blind spot in AI research
Kate Crawford& Ryan Calo
Nature 538, 311–313 (20 October 2016)
57. ❏ AI can be sexist and racist — it’s time to make it fair James Zou &
Londa Schiebinger - Nature 559, 324-326 (2018)
❏ The Master Algorithm Pedro Domingos, 2015
❏ Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
❏ No Classification without Representation: Assessing Geodiversity
Issues in Open Data Sets for the Developing World Shreya Shankar,
Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley
❏ Man is to computer programmer as woman is to homemaker?
Debiasing word embeddings T. Bolukbasi, K.-W. Chang, J. Y. Zou, V.
Saligrama, A. T. Kalai,. Adv. Neural Inf. Process. Syst. 2016,
4349–4357 (2016)
References
58. ❏ There is a blind spot in AI research, Kate Crawford & Ryan Calo,
Nature 538, 311–313 (20 October 2016)
❏ Semantics Derived Automatically from Language Corpora Contain
Human-Like Biases, Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan, Science 356, no. 6334 (2017): 183–86
❏ Predictions Put Into Practice: a Quasi-experimental Evaluation of
Chicago's Predictive Policing Pilot Saunders, J., Hunt, P. & Hollywood,
J. S. J. Exp. Criminol. 12, 347–371 (2016).
❏ Runaway Feedback Loops in Predictive Policing Danielle Ensign et al.
arXiv:1706.09847
References
59. ❏ Object Recognition by Scene Alignment. B. C. Russell, A. Torralba, C.
Liu, R. Fergus, W. T. Freeman. Advances in Neural Information
Processing Systems, 2007.
❏ Fair Is Not the Default (https://design.google/library/fair-not-default/)
❏ “Playing with fairness” - David Weinberger.
❏ Google Machine Learning Crash Course
❏ What-if tool: https://pair-code.github.io/what-if-tool/
❏ Facet tool https://pair-code.github.io/facets/
References