AI Fails: Avoiding bias in your systems

1StoryStream.ai Dr Janet Bastiman @yssybyl
AI Fails:
how can you begin to overcome bias in design and test

The world’s leading automotive content platform
StoryStream is a dedicated automotive content platform, trusted by some of the
world’s leading car brands. Specifically created to help automotive brands
provide a more relevant, engaging customer experience, fuelled with authentic
content and designed for efficiently scaling content operations across global
teams.
● Grow customer engagement and conversions by up to 25%
● Reduce content creation and management costs by up to 60%
● Provide a more authentic customer experience
● Understand your customer in a deeper way
About StoryStream
The Core StoryStream Benefits

Tonight I’m going to be looking at why so many big companies
have a problem with bias and what checks and balances you
can put in place to help prevent falling victim to these types of
errors.
For argument’s sake, I’m using AI as a superset of machine
learning, deep learning and all other techniques that lead to a
system that appears to make intelligent decisions.
Also, since this is a short talk, bear in mind that each one of
these slides warrants a full presentation in itself, so this will be
a high level introduction to get you thinking about things.

AI fails - Bias
***

All those headlines make us feel uncomfortable, and rightly so.
They cover Amazon’s sexist recruitment AI, Google’s image
tagging and the COMPAS system to predict re-offense rates. I
think everyone in this room would nod sagely about how bad
these are and claim it would never happen on their watch.
So why do we keep seeing this happen?
What about the AI that doesn’t make the headlines, the ones
quietly deciding whether you get sent a special offer, a credit
card, a cancer diagnosis? The things most of us are working
on. What if our work is flawed but never makes the headlines
– would we know?
Nobody in this field sets out to make a bad AI, so why does
this happen?
It’d be easy to say have a diverse team and diverse data, but
that’s not good enough.

What is bias?
an unwarranted correlation between input
variable and output classification
IMPACT is more important than ACCURACY

Let’s take a step back and look at “What is bias?”
If I gave you all a test no doubt you’d write answers around
underfitting and overfitting – mathematical answers.
Focus on the more descriptive definition: an unwarranted
correlation.
For many of us, from a position of privilege, it’s hard to really
understand the impact of being on the receiving end of these
correlations.
So how are these biases introduced? Let’s take a look at the
maths…

Maths
“Fairness” assumes:
A. Calibration within groups
B. Balance for negative class
C. Balance for positive class
Can only be achieved if prediction is
perfect or there are completely equal
base rates
● You cannot balance everything
● Either:
○ prediction is unbiased
○ or error is unbiased
● Fairness is personal
Conclusions
Chouldechova
https://arxiv.org/abs/1610.07524
Kleinberg et al
https://arxiv.org/abs/1609.05807

Both of these papers were studies into whether the COMPAS
system was deliberately biased, and come to the same
conclusions via different proofs. They are both well worth a
read. As a side note, there’s a minor mathematical error in the
Kleinburg paper (which does not affect the proof) but worth
noting you shouldn’t just blindly implement what you read in
papers .
Starting with a definition of fairness in both papers they
conclude that:
Unless you have perfect prediction and a balanced population,
you will have either bias in positive prediction or bias in error
rates.

So to avoid bias, mathematically, we need to live in an
unbiased world. Sadly this is not the case.
You cannot have positive parity and error parity at the same
time, you can only choose which is least unacceptable.
COMPAS chose to minimise false negatives and as a result
created something that was racially biased.
For all real problems you will violate one of the fairness
measures. Typically we focus on overall accuracy and most
practitioners don’t think further.
We are post-GDPR now so if you are making inferences
against protected variables make sure you are storing them
correctly and have some explainability (*whole other talk  )

Data Errors
● Selection bias
● Random Sampling
● Over coverage
● Undercoverage
● Measurement (Response) error
● Processing errors
● Participation bias

In addition to the mathematics of creating AI, bias creeps in
earlier in the chain. Unless you are lucky enough to get a full
view of your data pipeline, you way not have a good
understanding of how you’ve ended up with the data in front of
you.
If you’ve done statistical sampling theory then you’ll be aware
of this, but here’s a taster. There are seven key data sampling
errors that you should know and be able to ask about before
building any model.
The data available to any company is by nature limited to a
subset of all possible data. The graph shows the
mathematical spread of accuracy of a system predicting a 50%
average score based on population size of sample. Small data
sets can cause large variations.
Extrapolate to your own data – where are the holes?

Example: Is Oxford racially biased for admissions?
A couple of years ago, admissions data from Oxford University showed
that there was a much lower proportion of BAME students offered places
than were in the general population. While I’m not discounting that there
was racial bias occurring, let’s look at some of the data biases involved:
- Students at private schools are more likely to apply than state school
students with the same grades (selection)
- Students at private schools are mostly white (undercoverage)
- Students from state schools are more likely to apply to popular /
oversubscribed courses due to curriculum restrictions (participation)
All of these affect the perceived outcome and can exacerbate or mask a
true result. Know the providence of your data and where the sampling
impacts your results.

Everyone is biased
You are no exception

This is really important. Accept that everyone is biased in some way.
We are biased by our experiences (positive and negative) and we are
biased by the comments from the networks we trust. Every day our
biases are reinforced.
Our data sets are affected by our biases.
Our test sets are affected by our biases.
We need to get into a different mindset.
The image on the next slide is from:
https://www.designhacks.co/products/cognitive-bias-codex-poster
Buy a copy and put it somewhere you can see everyday. I have!

These are your biases and why – please give the people who created
this the traffic and buy the poster!
This is how you are manipulated. This is how you justify bad
behaviour. Apply this to your day to day life. Question yourself if you
find yourself agreeing or disagreeing on “gut instinct”. Stop yourself if
you make sweeping generalisations.
Challenge yourself. This is why we are bad at gathering data and why
we are bad at analysing it. We are primed to see patterns even when
they are not there. I’ve had blazing rows with more than one C-level
exec because they have seen something in the data that just isn’t
there. Saying a model is wrong because it doesn’t fit expectations is
just as bad as saying it is correct just because it does fit your own
biases.

Without understanding your biases you will create data sets that fit
your own experience profile. You will discard data points that don’t fit
without being conscious of it. If you get the results you expect you will
not test them as thoroughly as if they disagreed with your
expectations. You will twist your models for an experience that makes
you comfortable at the expense of others.
Be cognisant of your own biases. Diverse teams help here, but even
with this, challenge yourselves.
Which brings me to testing… Most AI practitioners validate their
models but do not test them in the way that test engineers do…

AI testing is not TESTING
What happens if your model
gets bad data?
Humans just love to prove
superiority over tech.
Learn how to break everything you create
https://www.sempf.net/post/On-Testing1

Sure, you test your models against known data and you probably have
a golden test set and do final validation against that. You may even
have a pipeline for constant sampling and retest as live data goes
through your system. The problem is that fails are accepted as part of
the overall statistics: “it’s only 1%”, “the system wasn’t designed for
that”, “that failed because of [thing you’re not going to change]”.
The issue is that most people are reticent to really and thoroughly test
their systems. If you’ve come from a software engineering
background then you should be familiar with these concepts, but
optimisation and testing are the two biggest omissions in every AI
course I’ve seen.
Learn to break your models…

All models fail in some circumstances – find those situations, go out of
your way to understand your models so thoroughly you should never
be surprised. Test them with the broadest range of data you can. Do
your own adversarial attacks. I regularly test my team’s models with
pictures of my cats and static…
Read Bill Sempf’s blog post for a great example of how to test a
simple input box that expects a number. Extrapolate this to your
systems (*whole other talk in this!)
This doesn’t mean that systems have to be perfect to be released.
We live in the real world where you will be pushed (probably by people
like me) to get solutions out. Push back, be clear on limitations so that
decisions can be made about the risks. Add in safeguards for when
your model is wrong.
Even with all this testing, the thing that should be at the front of your
mind is not accuracy but impact.

Impact > accuracy

All businesses should care about the impact of the AI they create.
Rather than talking, accuracy, recall, precision, let’s shift to impact. What
is the impact of mislabelling a car? Getting someone’s gender incorrect?
Refusing a loan? Incarcerating an innocent person? Missing a diagnosis of
a terminal illness?
The answer may be different for different individuals – what might be a non
issue for one person could be life-changing for another. Stop thinking from
your own position of privilege and your own biases and take a broader
view.
How is the information used – will there be a human in the loop? Put
yourself in the position of the most vulnerable and marginalised users of
your system and ask what is the impact of a false positive or false negative
on them. Don’t brush off those results if they don’t fit with your experience.

Summary
You are biased
Challenge your biases
Understand data provenance
Break everything you create
Create AI mindful of impact on the individual

AI Fails: Avoiding bias in your systems

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to AI Fails: Avoiding bias in your systems

Similar to AI Fails: Avoiding bias in your systems (20)

More from Dr Janet Bastiman

More from Dr Janet Bastiman (8)

Recently uploaded

Recently uploaded (20)

AI Fails: Avoiding bias in your systems