SlideShare uma empresa Scribd logo
1 de 38
Testing for Cognitive Bias in
AI Systems
Peter Varhol, Technology Strategy Research
About Me
• International speaker and writer
• Degrees in Math, CS, Psychology
• Technology communicator
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
What You Will Learn
• What kind of systems produce nondeterministic results based on
data
• How machine learning results can be biased
• How to recognize and understand bias in machine learning
systems
Agenda
• What are machine learning and adaptive systems?
• How can software possibly be biased?
• How can we tell if they are biased?
• Can we do anything?
• Summary and conclusions
Machine Learning and Adaptive Systems
• We are building a different kind of software
• It never returns the same result
• That doesn’t make it wrong
• How can we assess if it’s correct?
• How do we know if there is a bug?
• How do we know if it’s biased?
Bug vs. Bias
• A bug is an identifiable and measurable error in process or result
• Usually fixed with a code change
• A bias is a systematic inflection in decisions that produces results
inconsistent with reality
• Bias can’t be fixed with a code change
How Does This Happen?
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” can usually work
• As long as we can quantify “close enough”
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
• We choose the data
• The software “learns” from past actions
How Can We Tell If It’s Biased?
• We look very carefully at the training data
• We set strict success criteria based on the system requirements
• We run many tests
• Most change parameters only slightly
• Some use radical inputs
• Compare results to success criteria
We Train Systems With Data
• We know the input data and expected results
• Input data is run through the algorithms and the results compared
to actual results
• The algorithms adjust parameters based on a comparison
between algorithm and actual
• What happens if the data systematically
misrepresents the domain?
Amazon Can’t Rid Its AI of Bias
• Amazon created an AI to crawl the web to find job candidates
• Training data was all resumes submitted for the last ten years
• In IT, the overwhelming majority were male
• The AI “learned” that males were superior for IT jobs
• Amazon couldn’t fix that training bias
Amazon Can’t Rid Its AI of Bias
• So just train the AI with the candidates hired
• Right?
• No, because males were also hired in overwhelming numbers
• Amazon hasn’t found a way to overcome this bias
• So they stopped using the tool
• Does all data have bias?
Many Systems Use Objective Data
• Electric wind sensor
• Determines wind speed and direction
• Based on the cooling of filaments
• Designed a three-layer neural network
• Then used the known data to train it
• Cooling in degrees of all four filaments
• Wind speed, direction
Can This Possibly Be Biased?
• Well, yes
• The training data could have been recorded in single
temperature/sunlight/humidity conditions
• Which could affect results under those conditions
• It’s a possible bias that doesn’t hurt anyone
• Or does it?
• Does anyone remember a certain O-ring?
Where Do Biases Come From?
• Data selection
• We choose training data that represents only one segment of the domain
• We limit our training data to certain times or seasons
• We overrepresent one population
• Or
• The problem domain has subtly changed
Where Do Biases Come From?
• Latent bias
• Concepts become incorrectly correlated
• Correlation does not mean causation
• But it be high enough to believe
• We could be promoting stereotypes
• This describes Amazon’s problem
Where Do Biases Come From?
• Interaction bias
• We may focus on keywords that users apply incorrectly
• User incorporates slang or unusual words
• “That’s bad, man”
• The story of Microsoft Tay
• It wasn’t bad, it was trained that way
Why Does Bias Matter?
• Wrong answers
• Often with no recourse
• Subtle discrimination (legal or illegal)
• And no one knows it
• Suboptimal results
• We’re not getting it right often enough
How Are These Systems Used?
• Transportation
• Self-driving cars
• Ecommerce
• Recommendation engines
• Finance
• Stock trading systems
• HR
• This is a scary one
• Medicine
• This is scarier
It’s Not Just AI
• All software has biases
• It’s written by people
• People make decisions on how to design and implement
• Bias is inevitable
• But can we find it and correct it?
• Do we have to?
Like This One
• A London doctor can’t get into her fitness center locker room
• The fitness center uses a “smart card” to access and record services
• While acknowledging the problem
• The fitness center couldn’t fix it
• But the software development team could
• They had hard-coded “doctor” to be synonymous with “male”
• It was meant as a convenient shortcut
About That Data
• We use data from the problem domain
• What’s that?
• In some cases, scientific measurements are accurate
• But we can choose the wrong measures
• Or not fully represent the problem domain
• But data can also be subjective
• We train with photos of one race over another
• We train with our own values of beauty
Another Example
• Retail recommendation engines
• Other people bought this
• You may also be interested in that
• They can bring in additional revenue
• But they can be very wrong sometimes
• The algorithms think we want to see
what they show us
Where Bias Can Be Very Wrong
• Brooks Ghost running shoes
• Versus ghost costumes
• Our data doesn’t take context into account
Challenges to Validating Requirements
• What does it mean to be correct?
• The result may be different every time
• There is no one single right answer
• How will this really work in production?
• How do I test it for bias?
Possible Answers
• Only look at outputs for given inputs
• And set accuracy parameters
• Don’t look at the outputs at all
• Focus on performance/usability/other features
• We can’t test accuracy
• Throw up our hands and go home
• We can’t easily define bias, let alone test for it
Testing Machine Learning Systems For Bias
• Have objective acceptance criteria
• Test with new data
• Don’t count on all results being accurate
• Understand the architecture of the network as a part of the testing
process
• Communicate the level of confidence you have in the results
Gearing Up to Test
• Think outside the box
• Understand your users intimately
• What will a user do?
• Know the training data
• Look for bias in the data
• Write tests to exploit edge cases
• Run many tests to look for trends
What is a Bug?
• A mismatch between inputs and outputs?
• It supposed to be that way!
• Not every recommendation will be a good one
• But that doesn’t mean it’s a bug
• Too many wrong answers
• Define too many
• Define wrong
We Found a Bug, Now What?
• The bug could be unrelated to the neural network
• Treat it as a normal bug
• If the neural network is involved
• Determine a definition of inaccurate
• Determine the likelihood of an inaccurate answer
• This may involve serious redevelopment
Bias and Truth and AI, Oh My
• All human decisions are biased in some way
• Do we want human decisions, or do we want correct ones?
• Human decisions come with human biases
• Can we get those out of our software?
• And will it be better if we do?
• Even with bias, computer decisions may be better
A Larger Problem
• What if we want the system to replicate how we would decide?
• Irrespective of bias
• We may accept bias if it is consistent
• After all, humans are biased
• This is a question with no easy answer in practice
Delivering in the Clutch
• Machines treat all events as equal
• Humans recognize the importance of some events
• And sometimes can rise to the occasion
• There is no mechanism for code to do this
• We could have data and algorithms to recognize
the importance of a specific event
• But the software cannot “improve” its answer
• This is less a bias than an inherent weakness
The Human in the Loop
• We don’t understand complex software systems
• Disasters often happen because software behaves in unexpected
ways
• Human oversight may prevent disasters, or wrong decisions
• Can we overcome human bias?
• The problem is that machines respond too quickly
• In many cases, there is not enough time for human oversight
Objections
• I will never encounter this type of application!
• You might be surprised
• I will do what I’ve always done
• No, you won’t
• My goals will be defined by others
• Unless they’re not
• You may be the one
Conclusions
• You can’t make intelligence from code
• Algorithms are thoughtless
• “Learning” systems are what we program and train
• No easy answer
• If we choose the data, it will likely be biased
• Recognize how your data fits into the problem domain
References
• https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-
whos-to-blame/
• https://qz.com/1089555/artificial-intelligence-programmers-are-
trying-to-turn-code-into-intelligence-like-alchemists-tried-turning-
lead-into-gold/
• https://www.theatlantic.com/technology/archive/2017/09/saving-
the-world-from-code/540393/
• https://qz.com/work/1454396/amazons-sexist-hiring-algorithm-
could-still-be-better-than-a-human/
The End
Peter Varhol
peter@petervarhol.com
@pvarhol

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Worst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollWorst practices in software testing by the Testing troll
Worst practices in software testing by the Testing troll
 
Automated decision making with predictive applications – Big Data Frankfurt
Automated decision making with predictive applications – Big Data FrankfurtAutomated decision making with predictive applications – Big Data Frankfurt
Automated decision making with predictive applications – Big Data Frankfurt
 
The good, the bad, and the metrics webinar hosted by xbo soft
The good, the bad, and the metrics webinar hosted by xbo softThe good, the bad, and the metrics webinar hosted by xbo soft
The good, the bad, and the metrics webinar hosted by xbo soft
 
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
Automated Decision Making with Predictive Applications – Big Data DüsseldorfAutomated Decision Making with Predictive Applications – Big Data Düsseldorf
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
 
"Worst" practices of software testing
"Worst" practices of software testing"Worst" practices of software testing
"Worst" practices of software testing
 
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
 
Test automation – the bitter truth
Test automation – the bitter truthTest automation – the bitter truth
Test automation – the bitter truth
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype Testing
 
Machine Learning Vital Signs
Machine Learning Vital SignsMachine Learning Vital Signs
Machine Learning Vital Signs
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using Axioms
 
Hindsight lessons about API testing
Hindsight lessons about API testingHindsight lessons about API testing
Hindsight lessons about API testing
 
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
 
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product Managers
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers
 
Building a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering FailureBuilding a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering Failure
 
How to think smarter about software development
How to think smarter about software developmentHow to think smarter about software development
How to think smarter about software development
 
Test-Driven Development
 Test-Driven Development  Test-Driven Development
Test-Driven Development
 

Semelhante a Testing for cognitive bias in ai systems

Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
OrateTeam
 
How did i miss that bug rtc
How did i miss that bug rtcHow did i miss that bug rtc
How did i miss that bug rtc
GerieOwen
 

Semelhante a Testing for cognitive bias in ai systems (20)

Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
 
Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!
 
How did i miss that bug rtc
How did i miss that bug rtcHow did i miss that bug rtc
How did i miss that bug rtc
 
Model validation
Model validationModel validation
Model validation
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
 
Joined up Marketing Thinkery. How to automate your business
Joined up Marketing Thinkery. How to automate your businessJoined up Marketing Thinkery. How to automate your business
Joined up Marketing Thinkery. How to automate your business
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Bad Metric, Bad!
Bad Metric, Bad!Bad Metric, Bad!
Bad Metric, Bad!
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test data
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learn
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
How to Get the Most Out of Security Tools
How to Get the Most Out of Security ToolsHow to Get the Most Out of Security Tools
How to Get the Most Out of Security Tools
 
Using AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesUsing AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable Workplaces
 

Mais de Peter Varhol

Mais de Peter Varhol (12)

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor Syndrome
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the future
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolis
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing Teams
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debt
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignite
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightning
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps tool
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testing
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Testing for cognitive bias in ai systems

  • 1. Testing for Cognitive Bias in AI Systems Peter Varhol, Technology Strategy Research
  • 2. About Me • International speaker and writer • Degrees in Math, CS, Psychology • Technology communicator • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. What You Will Learn • What kind of systems produce nondeterministic results based on data • How machine learning results can be biased • How to recognize and understand bias in machine learning systems
  • 4. Agenda • What are machine learning and adaptive systems? • How can software possibly be biased? • How can we tell if they are biased? • Can we do anything? • Summary and conclusions
  • 5. Machine Learning and Adaptive Systems • We are building a different kind of software • It never returns the same result • That doesn’t make it wrong • How can we assess if it’s correct? • How do we know if there is a bug? • How do we know if it’s biased?
  • 6. Bug vs. Bias • A bug is an identifiable and measurable error in process or result • Usually fixed with a code change • A bias is a systematic inflection in decisions that produces results inconsistent with reality • Bias can’t be fixed with a code change
  • 7. How Does This Happen? • The problem domain is ambiguous • There is no single “right” answer • “Close enough” can usually work • As long as we can quantify “close enough” • We don’t know quite why the software responds as it does • We can’t easily trace code paths • We choose the data • The software “learns” from past actions
  • 8. How Can We Tell If It’s Biased? • We look very carefully at the training data • We set strict success criteria based on the system requirements • We run many tests • Most change parameters only slightly • Some use radical inputs • Compare results to success criteria
  • 9. We Train Systems With Data • We know the input data and expected results • Input data is run through the algorithms and the results compared to actual results • The algorithms adjust parameters based on a comparison between algorithm and actual • What happens if the data systematically misrepresents the domain?
  • 10. Amazon Can’t Rid Its AI of Bias • Amazon created an AI to crawl the web to find job candidates • Training data was all resumes submitted for the last ten years • In IT, the overwhelming majority were male • The AI “learned” that males were superior for IT jobs • Amazon couldn’t fix that training bias
  • 11. Amazon Can’t Rid Its AI of Bias • So just train the AI with the candidates hired • Right? • No, because males were also hired in overwhelming numbers • Amazon hasn’t found a way to overcome this bias • So they stopped using the tool • Does all data have bias?
  • 12. Many Systems Use Objective Data • Electric wind sensor • Determines wind speed and direction • Based on the cooling of filaments • Designed a three-layer neural network • Then used the known data to train it • Cooling in degrees of all four filaments • Wind speed, direction
  • 13. Can This Possibly Be Biased? • Well, yes • The training data could have been recorded in single temperature/sunlight/humidity conditions • Which could affect results under those conditions • It’s a possible bias that doesn’t hurt anyone • Or does it? • Does anyone remember a certain O-ring?
  • 14. Where Do Biases Come From? • Data selection • We choose training data that represents only one segment of the domain • We limit our training data to certain times or seasons • We overrepresent one population • Or • The problem domain has subtly changed
  • 15. Where Do Biases Come From? • Latent bias • Concepts become incorrectly correlated • Correlation does not mean causation • But it be high enough to believe • We could be promoting stereotypes • This describes Amazon’s problem
  • 16. Where Do Biases Come From? • Interaction bias • We may focus on keywords that users apply incorrectly • User incorporates slang or unusual words • “That’s bad, man” • The story of Microsoft Tay • It wasn’t bad, it was trained that way
  • 17. Why Does Bias Matter? • Wrong answers • Often with no recourse • Subtle discrimination (legal or illegal) • And no one knows it • Suboptimal results • We’re not getting it right often enough
  • 18. How Are These Systems Used? • Transportation • Self-driving cars • Ecommerce • Recommendation engines • Finance • Stock trading systems • HR • This is a scary one • Medicine • This is scarier
  • 19. It’s Not Just AI • All software has biases • It’s written by people • People make decisions on how to design and implement • Bias is inevitable • But can we find it and correct it? • Do we have to?
  • 20. Like This One • A London doctor can’t get into her fitness center locker room • The fitness center uses a “smart card” to access and record services • While acknowledging the problem • The fitness center couldn’t fix it • But the software development team could • They had hard-coded “doctor” to be synonymous with “male” • It was meant as a convenient shortcut
  • 21. About That Data • We use data from the problem domain • What’s that? • In some cases, scientific measurements are accurate • But we can choose the wrong measures • Or not fully represent the problem domain • But data can also be subjective • We train with photos of one race over another • We train with our own values of beauty
  • 22. Another Example • Retail recommendation engines • Other people bought this • You may also be interested in that • They can bring in additional revenue • But they can be very wrong sometimes • The algorithms think we want to see what they show us
  • 23. Where Bias Can Be Very Wrong • Brooks Ghost running shoes • Versus ghost costumes • Our data doesn’t take context into account
  • 24. Challenges to Validating Requirements • What does it mean to be correct? • The result may be different every time • There is no one single right answer • How will this really work in production? • How do I test it for bias?
  • 25. Possible Answers • Only look at outputs for given inputs • And set accuracy parameters • Don’t look at the outputs at all • Focus on performance/usability/other features • We can’t test accuracy • Throw up our hands and go home • We can’t easily define bias, let alone test for it
  • 26. Testing Machine Learning Systems For Bias • Have objective acceptance criteria • Test with new data • Don’t count on all results being accurate • Understand the architecture of the network as a part of the testing process • Communicate the level of confidence you have in the results
  • 27. Gearing Up to Test • Think outside the box • Understand your users intimately • What will a user do? • Know the training data • Look for bias in the data • Write tests to exploit edge cases • Run many tests to look for trends
  • 28. What is a Bug? • A mismatch between inputs and outputs? • It supposed to be that way! • Not every recommendation will be a good one • But that doesn’t mean it’s a bug • Too many wrong answers • Define too many • Define wrong
  • 29. We Found a Bug, Now What? • The bug could be unrelated to the neural network • Treat it as a normal bug • If the neural network is involved • Determine a definition of inaccurate • Determine the likelihood of an inaccurate answer • This may involve serious redevelopment
  • 30. Bias and Truth and AI, Oh My • All human decisions are biased in some way • Do we want human decisions, or do we want correct ones? • Human decisions come with human biases • Can we get those out of our software? • And will it be better if we do? • Even with bias, computer decisions may be better
  • 31. A Larger Problem • What if we want the system to replicate how we would decide? • Irrespective of bias • We may accept bias if it is consistent • After all, humans are biased • This is a question with no easy answer in practice
  • 32. Delivering in the Clutch • Machines treat all events as equal • Humans recognize the importance of some events • And sometimes can rise to the occasion • There is no mechanism for code to do this • We could have data and algorithms to recognize the importance of a specific event • But the software cannot “improve” its answer • This is less a bias than an inherent weakness
  • 33. The Human in the Loop • We don’t understand complex software systems • Disasters often happen because software behaves in unexpected ways • Human oversight may prevent disasters, or wrong decisions • Can we overcome human bias? • The problem is that machines respond too quickly • In many cases, there is not enough time for human oversight
  • 34. Objections • I will never encounter this type of application! • You might be surprised • I will do what I’ve always done • No, you won’t • My goals will be defined by others • Unless they’re not • You may be the one
  • 35. Conclusions • You can’t make intelligence from code • Algorithms are thoughtless • “Learning” systems are what we program and train • No easy answer • If we choose the data, it will likely be biased • Recognize how your data fits into the problem domain
  • 36.
  • 37. References • https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you- whos-to-blame/ • https://qz.com/1089555/artificial-intelligence-programmers-are- trying-to-turn-code-into-intelligence-like-alchemists-tried-turning- lead-into-gold/ • https://www.theatlantic.com/technology/archive/2017/09/saving- the-world-from-code/540393/ • https://qz.com/work/1454396/amazons-sexist-hiring-algorithm- could-still-be-better-than-a-human/

Notas do Editor

  1. There is a type of software where having a defined output is no longer the case. Actually, two types. One is machine learning systems. The second is predictive analytics, or adaptive systems.
  2. These types of software are becoming increasingly common, in areas such as ecommerce, public transportation, automotive, finance, and computer networks. They have the potential to make decisions given sufficiently well-defined inputs and goals. In some instances, they are characterized as artificial intelligence, in that they seemingly make decisions that were once the purview of a human user or operator.
  3. How do you test this? You do know what the answer is supposed to be, because you built the network using the test data, but it will be rare to get an exactly correct answer all of the time. The product is actually tested during the training process. Training either brings convergence to accurate results, or it diverges. The question is how you evaluate the quality of the network.
  4. Have objective acceptance criteria. Know the amount of error you and your users are willing to accept. Test with new data. Once you’ve trained the network and frozen the architecture and coefficients, use fresh inputs and outputs to verify its accuracy. Don’t count on all results being accurate. That’s just the nature of the beast. And you may have to recommend throwing out the entire network architecture and starting over. Understand the architecture of the network as a part of the testing process. Few if any will be able to actually follow a set of inputs through the network of algorithms, but understanding how the network is constructed will help testers determine if another architecture might produce better results. Communicate the level of confidence you have in the results to management and users. Machine learning systems offer you the unique opportunity to describe confidence in statistical terms, so use them. One important thing to note is that the training data itself could well contain inaccuracies. In this case, because of measurement error, the recorded wind speed and direction could be off or ambiguous. In other cases, the cooling of the filament likely has some error in its measurement.
  5. You can’t make gold from lead, and you can’t make intelligence from code.