SlideShare uma empresa Scribd logo
1 de 59
Baixar para ler offline
Fairness in Machine Learning: are you
sure there is no bias in your
predictions?
Azzurra Ragone - Innovation Manager
@azzurraragone
Me…
Innovation Manager
Previous @Google DevRel team
Before Research fellow:
➢ Univ. Milano Bicocca,
➢ University of Michigan
➢ Politecnico of Bari
➢ University of Trento
People worry that computers will get too
smart and take over the world, but the
real problem is that they’re too stupid and
they’ve already taken over the world
The Master Algorithm
Pedro Domingos, 2015
How to make my ML system fair?
...and why care?
Our success, happiness and
wellbeing can be affected by other
decisions
Life-changing decisions:
➔ Admission to schools
➔ Job offers
➔ Patients screenings
➔ Mortgage grant
➔ ...
Arbitrary, inconsistent, or faulty decision-making thus
raises serious concerns because it risks limiting our
ability to achieve the goals that we have set for ourselves
and access the opportunities for which we are qualified.
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
How do we ensure that these decisions are
made the right way and for the right reasons?
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
The ML promise:
make decisions more consistent,
accurate and rigorous.
B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman.
Object Recognition by Scene Alignment.
Advances in Neural Information Processing Systems, 2007.
...but there are serious risks in learning
from examples.
Generalizing from examples
Source: https://design.google/library/fair-not-default/
Quick, Draw!
Generalizing from examples
Provide good examples:
- a sufficiently large and diverse set
- well annotated
Quick, Draw!
Source: https://design.google/library/fair-not-default/
Historical examples may reflect:
- Prejudices against a social group
- Cultural stereotypes
- Demographic inequalities
and finding patterns in these data means replicating these
same dynamics
Source: https://gluon-cv.mxnet.io/build/examples_datasets/imagenet.html
45% of ImageNet data comes from USA (4% of the world population)
3% of ImageNet data comes from China and India (36% of the world population)
Ref: Nature 559 and Shankar, S. et al. (2017)
Geo bias
Photo Credit: Left: iStock/Getty; Right: Prakash Singh/AFP/Getty (from Nature 559, 324-326 (2018))
Bride
Dress
Woman
Wedding
Performance
art
Costume
Word Embeddings
Debiasing Word Embeddings
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016).
Credit: Pictures by Pixabay
State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
Source: Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
State of the world
Data
Measurement
The Machine Learning Loop
Provenance of data
is crucial.
Data cleaning is
mandatory.
The world is “messy”
Photo by pasja1000 on Pixabay
Measurement defines:
- your variables of interest,
- the process for turning your
observations into numbers,
- how you actually collect the
data
[Fairness and Machine Learning, 2018]
Photo by Iker Urteaga on Unsplash
The target variable is the
hardest to measure.
It is made up for the purpose
of the problem.
It is not a property that
people possess or lack
Ex. “creditworthiness”, “good
employee”, “attractiveness”
[Fairness and Machine Learning, 2018]
Photo by David Paschke on Unsplash
State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
ML will extract
stereotypes the same
way that it extracts
knowledge
labor statistics and the
male-as-norm bias
almost perfectly predict
which pronoun will be
returned
[Caliskan et al., 2017]
ML works better with more data, so it will work less well for
members of minority groups
Sample size disparity
Training set
Training data
State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
It’s not always about “Prediction” (“is
this patient at high risk for
cancer?”).
It can be classification (determine
whether a piece of email is spam),
regression (assigning risk
scores to defendants), or information
retrieval (finding documents
that best match a search query).
Photo by Tobias Zils on Unsplash
Predictions - actions - outcome
Photo by Pixabay
State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
If you predict future prices (and publicizes them) you create a self-fulfilling
feedback loop: houses with a lower sales prices predicted deter buyers,
demand goes down and the final price is even lower
House price prediction
PhotobyDevaDarshanonUnsplash
Some communities may be disproportionately targeted, with people being
arrested for crimes that might be ignored in other communities.
Ref.: Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016).
Self-fulfilling predictions
PhotobyJacquesTiberionPixabay
“Feedback loops occur when data discovered on the
basis of predictions are used to update the model.”
Danielle Ensign et al.,
“Runaway Feedback Loops in Predictive Policing,” 2017
State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
Training data encode the demographic disparities in our society and
some stereotypes can be reinforced by ML (due to feedback loop)
The state of society
PhotobyCorySchadtonUnsplash
Solutions?
Bias may lurk in your data...
Analyze your data
Source: Google Machine Learning Crash Course
★ Are there missing feature values for a large number of observations?
★ Are there features that are missing that might affect other features?
★ Are there any unexpected feature values?
★ What signs of data skew do you see?
Missing feature values
Source: California Housing dataset,
Google Machine Learning Crash Course
Skew data (geographical bias)
Source: California Housing dataset,
Google Machine Learning Crash Course
Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
Facets Overview, an
interactive
visualization tool to
explore datasets.
Quickly analyze the
distribution of
values across the
datasets.
Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
⅔ of examples
represent males,
while we would
expect the
breakdown
between
genders to be
closer to 50/50
Facets Dive
Source: Facet tool
(https://pair-code.github.io/facets/)
Data are faceted by
marital-status
feature. Male
outnumbers female
by more than 5:1.
Married women are
underrepresented in
our data.
Evaluating for Bias
Source: Google Machine Learning Crash Course
Model to predict the presence of tumors evaluated
against a validation set of 1,000 patients.
500 records from female patients
500 records from male patients.
Evaluating for Bias
Source: Google Machine Learning Crash Course
the model incorrectly predicts tumor in 9.1%
the model misses a tumor diagnosis in 9.1%
the model incorrectly predicts tumor in 33.3%
the model misses a tumor diagnosis in 45.5%
“What-if” tool
Analyze ML model
without writing code.
Given pointers to a
TF model and a
dataset, the What-If
Tool offers an
interactive visual
interface for
exploring model
results.
Counterfactuals
It is possible to
compare a datapoint
to the most similar
point where your
model predicts a
different result.
Counterfactuals
a minor difference in
age and an
occupation change
flipped the model’s
prediction (earning
>50K)
Visualize inference results
Compare the
performance of two
models, or inspect a
single model's
performance by
organizing inference
results into confusion
matrices, scatterplots or
histograms.
Edit a datapoint
Edit a datapoint and see
how your model performs.
Edit, add or remove
features or feature values
for any selected datapoint
and then run inference to
test model performance.
Test algorithmic fairness
Slice your dataset into
subgroups and explore the
effect of different
algorithmic fairness
constraints
See: “Playing with fairness”
by David Weinberger.
★ Measurement is crucial
★ Know your data (and how data were collected and annotated)
★ Try to discover hidden biases (missing values, data skew, subgroups, etc.)
★ Ask questions. Don’t train the model and then walk away
★ Avoid feedback loop
★ Use tools that allow you to do such investigation
Key Takeaways
AI is a cultural shift as much as a technical one.
Autonomous systems are changing workplaces, streets
and schools.
We need to ensure that those changes are beneficial,
before they are built further into the infrastructure of every­
day life.
There is a blind spot in AI research
Kate Crawford& Ryan Calo
Nature 538, 311–313 (20 October 2016)
Thanks!
@azzurraragone
❏ AI can be sexist and racist — it’s time to make it fair James Zou &
Londa Schiebinger - Nature 559, 324-326 (2018)
❏ The Master Algorithm Pedro Domingos, 2015
❏ Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
❏ No Classification without Representation: Assessing Geodiversity
Issues in Open Data Sets for the Developing World Shreya Shankar,
Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley
❏ Man is to computer programmer as woman is to homemaker?
Debiasing word embeddings T. Bolukbasi, K.-W. Chang, J. Y. Zou, V.
Saligrama, A. T. Kalai,. Adv. Neural Inf. Process. Syst. 2016,
4349–4357 (2016)
References
❏ There is a blind spot in AI research, Kate Crawford & Ryan Calo,
Nature 538, 311–313 (20 October 2016)
❏ Semantics Derived Automatically from Language Corpora Contain
Human-Like Biases, Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan, Science 356, no. 6334 (2017): 183–86
❏ Predictions Put Into Practice: a Quasi-experimental Evaluation of
Chicago's Predictive Policing Pilot Saunders, J., Hunt, P. & Hollywood,
J. S. J. Exp. Criminol. 12, 347–371 (2016).
❏ Runaway Feedback Loops in Predictive Policing Danielle Ensign et al.
arXiv:1706.09847
References
❏ Object Recognition by Scene Alignment. B. C. Russell, A. Torralba, C.
Liu, R. Fergus, W. T. Freeman. Advances in Neural Information
Processing Systems, 2007.
❏ Fair Is Not the Default (https://design.google/library/fair-not-default/)
❏ “Playing with fairness” - David Weinberger.
❏ Google Machine Learning Crash Course
❏ What-if tool: https://pair-code.github.io/what-if-tool/
❏ Facet tool https://pair-code.github.io/facets/
References

Mais conteúdo relacionado

Mais procurados

Algorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief IntroductionAlgorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief IntroductionAnthonyMelson
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairnessAnthonyMelson
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?University of Minnesota, Duluth
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Ethics in the use of Data & AI
Ethics in the use of Data & AI Ethics in the use of Data & AI
Ethics in the use of Data & AI Kalilur Rahman
 
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...Verena Rieser
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligencesuresh sood
 
Lies, Damn Lies, and Big Data
Lies, Damn Lies, and Big DataLies, Damn Lies, and Big Data
Lies, Damn Lies, and Big DataBrian Bissett
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesightsuresh sood
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger Hoerl
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
Technology for everyone - AI ethics and Bias
Technology for everyone - AI ethics and BiasTechnology for everyone - AI ethics and Bias
Technology for everyone - AI ethics and BiasMarion Mulder
 
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Saurabh Mishra
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017Big Data Spain
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic WebJames Hendler
 

Mais procurados (20)

Model bias in AI
Model bias in AIModel bias in AI
Model bias in AI
 
Algorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief IntroductionAlgorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief Introduction
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Ethics in the use of Data & AI
Ethics in the use of Data & AI Ethics in the use of Data & AI
Ethics in the use of Data & AI
 
The battle to prevent another Jan. 6 features a new weapon: The algorithm
The battle to prevent another Jan. 6 features a new weapon: The algorithmThe battle to prevent another Jan. 6 features a new weapon: The algorithm
The battle to prevent another Jan. 6 features a new weapon: The algorithm
 
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligence
 
Lies, Damn Lies, and Big Data
Lies, Damn Lies, and Big DataLies, Damn Lies, and Big Data
Lies, Damn Lies, and Big Data
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Technology for everyone - AI ethics and Bias
Technology for everyone - AI ethics and BiasTechnology for everyone - AI ethics and Bias
Technology for everyone - AI ethics and Bias
 
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
 

Semelhante a Fairness in Machine Learning @Codemotion

Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine LearningDelip Rao
 
Big data, Big prejudice: how algorithms can discriminate?
Big data, Big prejudice: how algorithms can discriminate?Big data, Big prejudice: how algorithms can discriminate?
Big data, Big prejudice: how algorithms can discriminate?Sara_Hajian
 
Copy of getting into ai event slides (PDF)
Copy of getting into ai   event slides (PDF)Copy of getting into ai   event slides (PDF)
Copy of getting into ai event slides (PDF)Matthew Miller
 
Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...Lviv Startup Club
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationUniversity of South Africa (Unisa)
 
Zombie categories, broken data and biased algorithms: What else can go wrong?...
Zombie categories, broken data and biased algorithms: What else can go wrong?...Zombie categories, broken data and biased algorithms: What else can go wrong?...
Zombie categories, broken data and biased algorithms: What else can go wrong?...University of South Africa (Unisa)
 
Responsible AI
Responsible AIResponsible AI
Responsible AINeo4j
 
press release final
press release finalpress release final
press release finalJeff Maehre
 
Responsible AI
Responsible AIResponsible AI
Responsible AINeo4j
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingTextkernel
 
Digital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIDigital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIIngmar Weber
 
Designing Against a Data Dystopia
Designing Against a Data DystopiaDesigning Against a Data Dystopia
Designing Against a Data DystopiaAgnes Pyrchla
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?University of South Africa (Unisa)
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
 
mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...
mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...
mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...Levi Shapiro
 
Neo4j - Responsible AI
Neo4j - Responsible AINeo4j - Responsible AI
Neo4j - Responsible AINeo4j
 

Semelhante a Fairness in Machine Learning @Codemotion (20)

Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
Big data, Big prejudice: how algorithms can discriminate?
Big data, Big prejudice: how algorithms can discriminate?Big data, Big prejudice: how algorithms can discriminate?
Big data, Big prejudice: how algorithms can discriminate?
 
Copy of getting into ai event slides (PDF)
Copy of getting into ai   event slides (PDF)Copy of getting into ai   event slides (PDF)
Copy of getting into ai event slides (PDF)
 
Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
Zombie categories, broken data and biased algorithms: What else can go wrong?...
Zombie categories, broken data and biased algorithms: What else can go wrong?...Zombie categories, broken data and biased algorithms: What else can go wrong?...
Zombie categories, broken data and biased algorithms: What else can go wrong?...
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
press release final
press release finalpress release final
press release final
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
 
Digital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIDigital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part II
 
Designing Against a Data Dystopia
Designing Against a Data DystopiaDesigning Against a Data Dystopia
Designing Against a Data Dystopia
 
The Ethics of AI
The Ethics of AIThe Ethics of AI
The Ethics of AI
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
 
mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...
mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...
mHealth Israel conference_Professor Erez Shmueli_MIT Media Lab_social physics...
 
Neo4j - Responsible AI
Neo4j - Responsible AINeo4j - Responsible AI
Neo4j - Responsible AI
 

Último

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 

Último (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 

Fairness in Machine Learning @Codemotion

  • 1. Fairness in Machine Learning: are you sure there is no bias in your predictions? Azzurra Ragone - Innovation Manager @azzurraragone
  • 2. Me… Innovation Manager Previous @Google DevRel team Before Research fellow: ➢ Univ. Milano Bicocca, ➢ University of Michigan ➢ Politecnico of Bari ➢ University of Trento
  • 3. People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world The Master Algorithm Pedro Domingos, 2015
  • 4. How to make my ML system fair? ...and why care?
  • 5. Our success, happiness and wellbeing can be affected by other decisions
  • 6. Life-changing decisions: ➔ Admission to schools ➔ Job offers ➔ Patients screenings ➔ Mortgage grant ➔ ...
  • 7. Arbitrary, inconsistent, or faulty decision-making thus raises serious concerns because it risks limiting our ability to achieve the goals that we have set for ourselves and access the opportunities for which we are qualified. Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
  • 8. How do we ensure that these decisions are made the right way and for the right reasons? Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
  • 9. The ML promise: make decisions more consistent, accurate and rigorous.
  • 10. B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman. Object Recognition by Scene Alignment. Advances in Neural Information Processing Systems, 2007.
  • 11. ...but there are serious risks in learning from examples.
  • 12. Generalizing from examples Source: https://design.google/library/fair-not-default/ Quick, Draw!
  • 13. Generalizing from examples Provide good examples: - a sufficiently large and diverse set - well annotated Quick, Draw! Source: https://design.google/library/fair-not-default/
  • 14. Historical examples may reflect: - Prejudices against a social group - Cultural stereotypes - Demographic inequalities and finding patterns in these data means replicating these same dynamics
  • 16. 45% of ImageNet data comes from USA (4% of the world population) 3% of ImageNet data comes from China and India (36% of the world population) Ref: Nature 559 and Shankar, S. et al. (2017) Geo bias
  • 17. Photo Credit: Left: iStock/Getty; Right: Prakash Singh/AFP/Getty (from Nature 559, 324-326 (2018)) Bride Dress Woman Wedding Performance art Costume
  • 19. Debiasing Word Embeddings Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016). Credit: Pictures by Pixabay
  • 20. State of the world Data Individuals Model Measurement Learning Action Feedback The Machine Learning Loop Source: Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
  • 21. State of the world Data Measurement The Machine Learning Loop
  • 22. Provenance of data is crucial. Data cleaning is mandatory. The world is “messy” Photo by pasja1000 on Pixabay
  • 23. Measurement defines: - your variables of interest, - the process for turning your observations into numbers, - how you actually collect the data [Fairness and Machine Learning, 2018] Photo by Iker Urteaga on Unsplash
  • 24. The target variable is the hardest to measure. It is made up for the purpose of the problem. It is not a property that people possess or lack Ex. “creditworthiness”, “good employee”, “attractiveness” [Fairness and Machine Learning, 2018] Photo by David Paschke on Unsplash
  • 25. State of the world Data Individuals Model Measurement Learning Action Feedback The Machine Learning Loop
  • 26. ML will extract stereotypes the same way that it extracts knowledge
  • 27. labor statistics and the male-as-norm bias almost perfectly predict which pronoun will be returned [Caliskan et al., 2017]
  • 28. ML works better with more data, so it will work less well for members of minority groups Sample size disparity Training set Training data
  • 29. State of the world Data Individuals Model Measurement Learning Action Feedback The Machine Learning Loop
  • 30. It’s not always about “Prediction” (“is this patient at high risk for cancer?”). It can be classification (determine whether a piece of email is spam), regression (assigning risk scores to defendants), or information retrieval (finding documents that best match a search query). Photo by Tobias Zils on Unsplash
  • 31. Predictions - actions - outcome Photo by Pixabay
  • 32. State of the world Data Individuals Model Measurement Learning Action Feedback The Machine Learning Loop
  • 33. If you predict future prices (and publicizes them) you create a self-fulfilling feedback loop: houses with a lower sales prices predicted deter buyers, demand goes down and the final price is even lower House price prediction PhotobyDevaDarshanonUnsplash
  • 34. Some communities may be disproportionately targeted, with people being arrested for crimes that might be ignored in other communities. Ref.: Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016). Self-fulfilling predictions PhotobyJacquesTiberionPixabay
  • 35. “Feedback loops occur when data discovered on the basis of predictions are used to update the model.” Danielle Ensign et al., “Runaway Feedback Loops in Predictive Policing,” 2017
  • 36. State of the world Data Individuals Model Measurement Learning Action Feedback The Machine Learning Loop
  • 37. Training data encode the demographic disparities in our society and some stereotypes can be reinforced by ML (due to feedback loop) The state of society PhotobyCorySchadtonUnsplash
  • 39. Bias may lurk in your data...
  • 40. Analyze your data Source: Google Machine Learning Crash Course ★ Are there missing feature values for a large number of observations? ★ Are there features that are missing that might affect other features? ★ Are there any unexpected feature values? ★ What signs of data skew do you see?
  • 41. Missing feature values Source: California Housing dataset, Google Machine Learning Crash Course
  • 42. Skew data (geographical bias) Source: California Housing dataset, Google Machine Learning Crash Course
  • 43. Facets Overview Source: Facet tool (https://pair-code.github.io/facets/) Facets Overview, an interactive visualization tool to explore datasets. Quickly analyze the distribution of values across the datasets.
  • 44. Facets Overview Source: Facet tool (https://pair-code.github.io/facets/) ⅔ of examples represent males, while we would expect the breakdown between genders to be closer to 50/50
  • 45. Facets Dive Source: Facet tool (https://pair-code.github.io/facets/) Data are faceted by marital-status feature. Male outnumbers female by more than 5:1. Married women are underrepresented in our data.
  • 46. Evaluating for Bias Source: Google Machine Learning Crash Course Model to predict the presence of tumors evaluated against a validation set of 1,000 patients. 500 records from female patients 500 records from male patients.
  • 47. Evaluating for Bias Source: Google Machine Learning Crash Course the model incorrectly predicts tumor in 9.1% the model misses a tumor diagnosis in 9.1% the model incorrectly predicts tumor in 33.3% the model misses a tumor diagnosis in 45.5%
  • 48. “What-if” tool Analyze ML model without writing code. Given pointers to a TF model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results.
  • 49. Counterfactuals It is possible to compare a datapoint to the most similar point where your model predicts a different result.
  • 50. Counterfactuals a minor difference in age and an occupation change flipped the model’s prediction (earning >50K)
  • 51. Visualize inference results Compare the performance of two models, or inspect a single model's performance by organizing inference results into confusion matrices, scatterplots or histograms.
  • 52. Edit a datapoint Edit a datapoint and see how your model performs. Edit, add or remove features or feature values for any selected datapoint and then run inference to test model performance.
  • 53. Test algorithmic fairness Slice your dataset into subgroups and explore the effect of different algorithmic fairness constraints See: “Playing with fairness” by David Weinberger.
  • 54. ★ Measurement is crucial ★ Know your data (and how data were collected and annotated) ★ Try to discover hidden biases (missing values, data skew, subgroups, etc.) ★ Ask questions. Don’t train the model and then walk away ★ Avoid feedback loop ★ Use tools that allow you to do such investigation Key Takeaways
  • 55. AI is a cultural shift as much as a technical one. Autonomous systems are changing workplaces, streets and schools. We need to ensure that those changes are beneficial, before they are built further into the infrastructure of every­ day life. There is a blind spot in AI research Kate Crawford& Ryan Calo Nature 538, 311–313 (20 October 2016)
  • 57. ❏ AI can be sexist and racist — it’s time to make it fair James Zou & Londa Schiebinger - Nature 559, 324-326 (2018) ❏ The Master Algorithm Pedro Domingos, 2015 ❏ Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan ❏ No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley ❏ Man is to computer programmer as woman is to homemaker? Debiasing word embeddings T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, A. T. Kalai,. Adv. Neural Inf. Process. Syst. 2016, 4349–4357 (2016) References
  • 58. ❏ There is a blind spot in AI research, Kate Crawford & Ryan Calo, Nature 538, 311–313 (20 October 2016) ❏ Semantics Derived Automatically from Language Corpora Contain Human-Like Biases, Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan, Science 356, no. 6334 (2017): 183–86 ❏ Predictions Put Into Practice: a Quasi-experimental Evaluation of Chicago's Predictive Policing Pilot Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016). ❏ Runaway Feedback Loops in Predictive Policing Danielle Ensign et al. arXiv:1706.09847 References
  • 59. ❏ Object Recognition by Scene Alignment. B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman. Advances in Neural Information Processing Systems, 2007. ❏ Fair Is Not the Default (https://design.google/library/fair-not-default/) ❏ “Playing with fairness” - David Weinberger. ❏ Google Machine Learning Crash Course ❏ What-if tool: https://pair-code.github.io/what-if-tool/ ❏ Facet tool https://pair-code.github.io/facets/ References