Who we meet, what we choose to do, and the costs of our choices, are increasingly influenced and even driven by software systems, algorithms and infrastructure. The decisions underlying how those work are made by people just like you.
This talk challenges the audience to consider the ongoing implications of decisions made early in the design process, and provides practical examples of translating moral standpoints into real-world implementations. Presented by ThoughtWorks Principle Data Engineer Simon Aubury and Project Manager Dr. Maia Sauren.
42. 42
Back to our risk
algorithm
How’s this looking?
Premium ($) = Risk (%) X Value ($)
● Features
○ Gender if ! EU
○ Car Model
○ Colour (maybe)
○ Postcode
○ Customer Name
○ Email
43. 43
What data to calculate risk?
What are our assumptions?
X X
MC SPEAKING,
Dr. Maia Sauren is a program and project manager at ThoughtWorks, with backgrounds in large-scale organisational transformation and healthcare applications. Maia’s previous lives were in biomedical engineering, science communication, and education.
Simon is a Principal Data Engineer with expertise in enterprise system design for large data systems.
MAIA SPEAKING
Who is alice?
What does Alice need, what is relevant ; and what are the consequence of our decisions?
Tonight we want to engage you in a conversation .. what guidelines should you consider for your organisation and yourself when building solutions?
What are the iimplications of decisions made early in the design process, and how this translates to real-world outcomes.
Maia speaking
Why do we care about alice, or her data? Why would anyone want that?
Maia speaking
Why do we care about alice, or her data? Why would anyone want that?
MAIA SPEAKING
Gender neutral ads for STEM jobs
Women click on the ads more
Ads become more expensive to show to women
Women see fewer ads for jobs in STEM
Algorithm aim: minimise spend
Maia speaking
Data needs to be
Accurate
Clean
Diverse
MAIA SPEAKING
MAIA SPEAKING
What is a signal?
What is an error?
MAIA SPEAKING
What is a signal?
What is an error?
MAIA SPEAKING
What is a signal?
What is an error?
MAIA SPEAKING
What is a signal?
What is an error?
MAIA SPEAKING
Noise as it applies to
Within greater context
Contextualised as relative harm, relative good, relative harm minimisation
MAIA SPEAKING
How do you teach that
To a child
To a machine
MAIA SPEAKING
How do you teach that
To a child
To a machine
Bias laundering through technology - Gretchen Mccullock, Because internet
MAIA SPEAKING
MAIA SPEAKING
So you have to give it blunt flexibility
In the design stage
MAIA SPEAKING
MAIA SPEAKING
So you have to give it blunt flexibility
In the design stage
MAIA SPEAKING
Alice Saunders
Refused airline ticket by British Aerospace
MAIA SPEAKING
Frank Starmer Follow
A large red stone (Uluru, south of Alice Springs)
An adventure south of Alice Springs to visit Uluru. Here is a sunset photo.
-- CC / https://www.flickr.com/photos/spiderman/2061070613
SIMON SPEAKING
Let’s illustrate data ethics with a scenario
Can we describes the value judgements we make about Alice
and approaches we make when generating, analysing and disseminating data about Alice
Do we understand good practice in computing techniques, ethics and information assurance appreciative of relevant legislation,
Guidelines - sensible defaults for individual teams
Framework - translating values to behaviours
Ethical position - organisational value statement
And then
Guidelines:
Framework:
Ethical position:
SIMON SPEAKING
We’re going to apply this lens by imaging the product decisions in a hypothical insurance co
Congrats on the new job
Think about the
SIMON SPEAKING
Alice has a need - an approariatly pricved insurance product
What is the best thing for Alice?
Let’s design a product
SIMON SPEAKING
Could have actuaries and risk tables
People are dumb and lazy – we need robots to do the maths for them.
SIMON SPEAKING
Firstly - we need to engage with Alice. This is typically a sales funnel
our business is based on data
TOS - plain language; (Linkedin has a sentance that’s 91 words long)
no dark patterns;
accessible and meaningful (eg. language)
organisational value statement
SIMON SPEAKING
Option 1 - poor
Option 2 - better ;
has choice
An explanation of WHY;
No word “Other” may make people feel like an after-thought
And an explicit call out that it’s kept private
Option 3 - best; don’t ask
SIMON SPEAKING
Credit decision based on non-compliance with instructions - blue pen users get a different weighting
AFter much investigation, was discovered to be most correlated to the pen left in the office
Data is not truth
Humans create, generate, collect, capture, and extend data. The results are often incomplete, and the process of analyzing them can be messy. Data can be biased through what is included or excluded, how it is interpreted, and how it is presented.
if the data is crappy, even the best algorithm won't help. Sometimes it's referred as "garbage in – garbage out".
SIMON SPEAKING
Respect privacy and the collective good
Now : let’s discuss an approach for data capture for a sensitive topic.
“Did you cheat on your partner in the last 12 months”
“Dispose of waste illegally.” or “Lie on an insurance application”
The randomized response technique (RRT) nearly 50 years can be used minimize the response bias from questions about sensitive issues.
Protect the individual ; but collect accurate data in aggregate
Respondents would be randomly assigned to answer either the sensitive question or the inverse question with a “yes” or “no” answer [8]. As is shown in
Table 1, the assignment undergoes a randomization procedure, such as rolling a die
SIMON SPEAKING
Back to our example; how can we capture a meaningful response : do you drive drunk
We want a represnatative answer;
but if we ask Alice directly we’re unlikely to get an accurate anser
How do you Respect privacy and the collective good ONLINE … without a dice?
Randomized Response Technique can be applied online by a bit of A/B testing
SIMON SPEAKING
Imagine we’ve now got data that our users are happy to share; and we feel that it’s important to our business
Or we have access to a partner data provider
What can we derive from these insights?
Basic example
Basics of
Features (Also known as parameters or variables) are the factors for a machine to look at. Those could be car mileage, user's gender, age, postcode; features are column names
Algorithms - we might pick ML algoirtim (maybe a decision tree for classification, or clusstering model) that based on the data classifies users, and determines a likely risk
SIMON SPEAKING
Let’s discuss algoirthmic approaches to understanding Alice
There;’s nothing new here; techniques here have been used for > 40 years
To illustrate - lket’;s think of the simplify case where we have a corpus of relationships
We can use algoirimit classifiers to associate frequence clusters of info with historric outcomes
If historically we’ve seen high correlations between April and Aries ; perhaps we can automate this assumption as part of our business
We could gloss this up as “AI”
Just because AI can do something doesn’t mean that it should.
What’s the effect on ALice is we apply this approach blindly?
SIMON SPEAKING
With only gender-neutral pronouns in languages like Malay, Google Translate says men are leaders, doctors, soldiers, and professors, while women are assistants, prostitutes, nurses, and maids.
The results are similar in Hungarian. According to findings by Strawberry Bomb, Google Translate interprets men as intelligent, and women as stupid. It also suggests that women are unemployed, cook, and clean.
Don’t presume the desirability of AI
Just because AI can do something doesn’t mean that it should. When AI is incorporated into a design, designers should continually pay attention to whether people’s needs are changing, or an AI’s behavior is changing.
SIMON SPEAKING
Dev responsibilities
Information handling
Responsibilities for all parties; partners; 3rd parties providers; stakeholders
PCI, PII, HIPA
HIPPA alone - 51 pages in the overview document for one tech vendor
Payments PCI - 71 pages for the same vendor
SIMON SPEAKING
notify the Australian Information Commissioner and individuals whose personal information has been subject to a data breach likely to result in serious harm
data breach arises when the following three criteria are satisfied:
There is unauthorised access to or unauthorised disclosure of personal information,
This is likely to result in serious harm to one or more individuals; and
The entity has not been able to prevent the likely risk of serious harm with remedial action.
assessment within 30 days
Respect privacy and the collective good
While there are policies and laws that shape the governance, collection, and use of data, we must hold ourselves to a higher standard than “will we get sued?” Consider design, governance of data use for new purposes, and communication of how people’s data will be used.
Call out downsides of holding data
It’s a liability;
Enduring
SIMON SPEAKING
Last line highlight
SIMON SPEAKING
Admiral admitted to charging drivers higher premiums for using a Hotmail address.
It found some enquires using a Hotmail address up to £31.36 more expensive than those using a Gmail account.
SIMON SPEAKING
Car colour does in fact make a big difference to road safety -
Silver, grey, red and black cars are most likely to be involved in accidents whereas white, orange and yellow are the safest colour choices
SIMON SPEAKING
So it’s getting a little harder to work out what features to use
SIMON SPEAKING
What do we now know about alice
SIMON SPEAKING
Baxck to our example ; did we pick good
Features (Also known as parameters or variables) are the factors for a machine to look at. Those could be car mileage, user's gender, age, postcode; features are column names
Algorithms - we might pick ML algoirtim (maybe a decision tree for classification, or clusstering model) that based on the data classifies users, and determines a likely risk
DID we find or influence our algoithm in any other way?
SIMON SPEAKING
Even if we don’t explicitly set out to capture the data; there’s implicit relationships in our feature set
Models can discover remarkable correlations
Can we describes the value judgements we make about Alice
and approaches we make when generating, analysing and disseminating data about Alice
Do we understand good practice in computing techniques, ethics and information assurance appreciative of relevant legislation,
Guidelines - sensible defaults for individual teams
Framework - translating values to behaviours
Ethical position - organisational value statement
MAIA SPEAKING
MAIA
What If tool from Google inspect a machine learning model
Test algorithmic fairness constraints
Examine the effect of preset algorithmic fairness constraints, such as equality of opportunity in supervised learning.
MAIA SPEAKING
This is how we’re thinking internally ; a canvas Risk-Based Ethical Use of Data
MAIA SPEAKING
MAIA SPEAKING
MAIA SPEAKING
MAIA SPEAKING
MAIA SPEAKING
Moriel Schottlender
MAIA SPEAKING
This is how we’re thinking internally ; a canvas Risk-Based Ethical Use of Data