O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Ansgar rcep algorithmic_bias_july2018

Carregando em…3

Confira estes a seguir

1 de 24 Anúncio

Ansgar rcep algorithmic_bias_july2018

Baixar para ler offline

Briefing to eCommerce negotiators on algorithmic decision making and associated issues of algorithmic bias. The presentation uses examples to highlight the types and causes of algorithmic decision making bias and to summarize the current state of regulatory responses.

Briefing to eCommerce negotiators on algorithmic decision making and associated issues of algorithmic bias. The presentation uses examples to highlight the types and causes of algorithmic decision making bias and to summarize the current state of regulatory responses.


Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Ansgar rcep algorithmic_bias_july2018 (20)


Mais recentes (20)


Ansgar rcep algorithmic_bias_july2018

  2. 2. Algorithmic Bias Considerations All non-trivial* decisions are biased We seek to minimize bias that is: Unintended Unjustified Unacceptable as defined by the context where the system is used. *Non-trivial means the decision has more than one possible outcome and the choice is not uniformly random.
  3. 3. Causes of algorithmic bias Insufficient understanding of the context of use. – who will be affected? Failure to rigorously identify the decision/optimization criteria. Failure to have explicit justifications for the chosen criteria. Failure to check if the justifications are acceptable in the context where the system is used. System not performing as intended. – implementation errors; unreliable input data.
  4. 4. ‘Coded Gaze’ Failure to consider user groups Face recognition algorithms built and tested using easily accessible examples – US university students. Sampling bias for WEIRD (White Educated Industrialized Rich Democratic) Joy Boulowimi - Algorithmic Justice League - https://www.ajlunited.org/
  5. 5. Arkansas algorithmic Medicaid assessment instrument When introduced in 2016, many people with cerebral palsy had their care dramatically reduced – they sued the state resulting in a court case. Investigation in court revealed: The algorithm relies on 60 answer scores to questions about descriptions, symptoms and ailments. A small number of variables could matter enormously: a difference between a three instead of a four on a handful of items meant a cut of dozens of care hours a month. One variable was foot problems. When an assessor visited a certain person, they wrote that the person didn’t have any foot problems — because they were an amputee and didn’t have feet. Third-party software vendor implementing the system, mistakenly used a version of the algorithm that didn’t account for diabetes issues. Cerebral palsy, wasn’t properly coded in the algorithm, causing incorrect calculations for hundreds of people, mostly lowering their hours. https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy
  6. 6. COMPAS Recidivism risk prediction not-biased, but still biased COMPAS recidivism prediction tool Estimates likelihood of criminals re-offending in future ◦ Producers of COMPAS made sure the tool produced similar accuracy rates for white and black offenders ProPublica investigation reported systematic racial bias in the recidivism predictions https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  7. 7. Accuracy rates vs. False Positives and False Negatives rates When base rates differ, no solution can simultaneously achieve similar rates of: ◦ Accuracy/Overall Misclassification ◦ False Negatives ◦ False Positives Figure 2Figure 1
  8. 8. Search engine manipulation effect could impact elections – distort competition Experiments that manipulated the search rankings for information about political candidates for 4556 undecided voters. i. biased search rankings can shift the voting preferences of undecided voters by 20% or more ii. the shift can be much higher in some demographic groups iii. such rankings can be masked so that people show no awareness of the manipulation. R. Epstein & R.E. Robertson “The search engine manipulation effect (SEME) and its possible impact on the outcome of elections”, PNAS, 112, E4512-21, 2015
  9. 9. Algorithmic price setting by sellers on amazon algorithmic tuning of prices becomes visible when something goes wrong Based on observations based reverse engineering: profnath algorithm: 1. Find highest-priced copy of book 2. set price to be 0.9983 times highest-priced. bordeebook algorithm: 1. Find highest-priced book from other sellers 2. Set price to 1.270589 times that price. However, “0.9983” and “1.270589” suggests the ratios were not chosen by humans http://www.michaeleisen.org/blog/?p=358
  10. 10. International Standards work-in-progress
  11. 11. 11 Initiative launched at end of 2016 https://standards.ieee.org/develop/indconn/ec/autonomous_systems.html
  12. 12. IEEE P70xx Standards Projects typically 3-4 years development earliest completions expected in next 6-12 months IEEE P7000: Model Process for Addressing Ethical Concerns During System Design IEEE P7001: Transparency of Autonomous Systems IEEE P7002: Data Privacy Process IEEE P7003: Algorithmic Bias Considerations IEEE P7004: Child and Student Data Governance IEEE P7005: Employer Data Governance IEEE P7006: Personal Data AI Agent Working Group IEEE P7007: Ontological Standard for Ethically Driven Robotics and Automation Systems IEEE P7008: Ethically Driven Nudging for Robotic, Intelligent and Autonomous Systems IEEE P7009: Fail-Safe Design of Autonomous and Semi-Autonomous Systems IEEE P7010: Wellbeing Metrics Standard for Ethical AI and Autonomous Systems IEEE P7011: Process of Identifying and Rating the Trustworthiness of News Sources IEEE P7012: Standard for Machines Readable Personal Privacy Terms 12
  13. 13. Reason for industry participation in IEEE P7003 Standard for Algorithmic Bias Considerations IEEE P7003 working group participant from corporation based in ASEAN AFP country reported an example why is company is interested in working on IEEE P7000 series standards: The company produces image recognition and video behaviour analysis system. One such system was implemented in an ASEAN country to monitor behaviour of drivers of public transport to report driver quality. The system was flagging up predominantly drivers from a specific ethnic group as problem cases. They need guidance on how to make sure, and how to communicate, that the system is not inadvertently basing its decisions on ethinicity.
  14. 14. Related AI standards activities ISO/IEC JTC 1/SC 42 Artificial Intelligence – started spring 2018 ◦ SG 1 Computational approaches and characteristics of AI systems ◦ SG 2 Trustworthiness ◦ SG 3 Use cases and applications ◦ WG 1 Foundational standards Jan 2018 China published “Artificial Intelligence Standardization White Paper.”
  15. 15. ACM Principles on Algorithmic Transparency and Accountability Awareness Access and Redress Accountability Explanation Data Provenance Auditability: Models, algorithms, data, and decisions should be recorded so that they can be audited in cases where harm is suspected Validation and Testing 15https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf
  16. 16. FAT/ML: Principles for Accountable Algorithms and a Social Impact Statement for Algorithms Responsibility: Externally visible avenues of redress for adverse effects, and designate an internal role responsible for timely remedy of such issues. Explainability: Ensure algorithmic decisions and data driving those decisions can be explained to end-users/stakeholders in non-technical terms. Accuracy: Identify, log, and articulate sources of error and uncertainty so that expected/worst case implications can inform mitigation procedures. Auditability: Enable third parties to probe, understand, and review algorithm behavior through disclosure of information that enables monitoring, checking, or criticism, including detailed documentation, technically suitable APIs, and permissive terms of use. Fairness: Ensure algorithmic decisions do not create discriminatory or unjust impacts when comparing across different demographics. 16https://www.fatml.org/resources/principles-for-accountable-algorithms
  17. 17. Regulatory legislation work-in-progress
  18. 18. NY city - Task Force To Examine Automated Decision Systems May 16, 2018 Mayor of NY created the Automated Decision Systems Task Force which will explore how New York City uses algorithms. The Task Force, the product of a law passed by City Council in December 2017, aims to produce a report in December 2019 recommending procedures for reviewing and assessing City algorithmic tools to ensure equity and opportunity. As part of the task force, AI NOW institute is developing a proposal for Algorithmic Impact Assessment [ https://ainowinstitute.org/aiareport2018.pdf ] including a requirement for Meaningful access: Allow researchers and auditors to review systems once they are deployed. https://www1.nyc.gov/office-of-the-mayor/news/251-18/mayor-de-blasio-first-in-nation-task-force-examine- automated-decision-systems-used-by
  19. 19. EU initiatives to understand the impact of AI/algorithmic decision making systems The European Commission (EC) and European Parliament (EP) are urgently working to identify the issues In June 2018 the EC assembled its High-Level Expert Group on Artificial Intelligence to support the development and implementation of a European strategy on AI – including challenges of: ◦ Ethics ◦ Law ◦ Societal impact ◦ Socio-economics The EP has: - called for reports on policy options for algorithmic transparency and accountability to be presented in October 2018 - instructed the EC to launch a project on Algorithmic Awareness Building and development of policy tools for algorithmic decision making. https://ec.europa.eu/digital-single-market/en/high-level-expert-group-artificial-intelligence
  20. 20. ansgar.koene@nottingham.ac.uk Thank you
  21. 21. Biography • Dr. Koene is Senior Research Fellow at the Horizon Digital Economy Research Institute of the University of Nottingham, where he conducts research on societal impact of Digital Technology. • Chairs the IEEE P7003TM Standard for Algorithms Bias Considerations working group., and leads the policy impact activities of the Horizon institute. • He is co-investigator on the UnBias project to develop regulation-, design- and education-recommendations for minimizing unintended, unjustified and inappropriate bias in algorithmic systems. • Over 15 years of experience researching and publishing on topics ranging from Robotics, AI and Computational Neuroscience to Human Behaviour studies and Tech. Policy recommendations. • He received his M.Eng., and Ph.D. in Electical Engineering & Neuroscience, respectively, from Delft University of Technology and Utrecht University. • Trustee of 5Rights, a UK based charity for Enabling Children and Young People to Access the digital world creatively, knowledgeably and fearlessly. Dr. Ansgar Koene ansgar.koene@nottingham.ac.uk https://www.linkedin.com/in/akoene/ https://www.nottingham.ac.uk/comp uterscience/people/ansgar.koene http://unbias.wp.horizon.ac.uk/
  22. 22. Institute of Electrical and Electronics Engineers (IEEE) IEEE is the world's largest association of technical professionals with more than 423,000 members in over 160 countries around the world IEEE Standards Association (IEEE-SA) is responsible for standards such as the wifi 802.11 standard which is what all wifi devices use so that they can communicate with each other In addition to developing its own P7000 series Standards, the IEEE Global Initiative on Ethics for Autonomous and Intelligent Systems is: An organizational member of the European Commission’s High-Level Expert Group on AI Has been consulted for input to OECD’s development of principles on AI Is collaborating with the Association for Computing Machines (ACM) on briefing the US government and Congress on AI Provided evidence to the UK’s Select Committee on AI The P7000 series standards are being referenced by ISO as part of the ISO SC42 AI Standard development
  23. 23. Reproducing human/historical bias Machine Learning requires large, reliable, ‘ground truth’ data. Training on historical databases reproducing historical bias. Bias in datasets can be reduced by targeted sampling, re-weighting and other methods.

Notas do Editor

  • All non-trivial decisions are biased. For example, a good results from a search engine should be biased to match the interests of the user as expressed by the search-term, and possibly refined based on personalization data.

    When we say we want ‘no Bias’ we mean we want to minimize unintended, unjustified and unacceptable bias, as defined by the context within which the algorithmic system is being used.
  • In the absence of malicious intent, bias in algorithmic system is generally caused by:

    Insufficient understanding of the context that the system is part of. This includes lack of understanding who will be affected by the algorithmic decision outcomes, resulting in a failure to test how the system performs for specific groups, who are often minorities. Diversity in the development team can partially help to address this.
    Failure to rigorously map decision criteria. When people think of algorithmic decisions as being more ‘objectively trustworthy’ than human decisions, more often than not they are referring to the idea that algorithmic systems follow a clearly defined set of criteria with no ‘hidden agenda’. The complexity of system development challenges, however, can easily introduce ‘hidden decision criteria’ introduced as a quick fix during debugging or embedded within Machine Learning training data.
    Failure to explicitly define and examine the justifications for the decision criteria. Given the context within which the system is used, are these justifications acceptable? For example, in a given context is it OK to treat high correlation as evidence of causation?
  • - While working with face recognition systems at MIT Joy Boulowimi noticed that the algorithms failed to reliably detect black faces.
    - Test subjects and data sets for developing these systems are mostly members of the research lab or university students, the majority of whom are white.
    - Simple performance metrics, such as “overall percent correct detection”, that do not consider differences between user groups therefor produced high performance scores despite the systems failing miserably for groups that were a small minority in the test set.
    - In Psychology research this common sampling bias is referred to as WEIRD, which stand for White Educated from Industrialized Rich Democratic countries.
  • The scholar Danielle Keats Citron cites the example of Colorado, where coders placed more than 900 incorrect rules into its public benefits system in the mid-2000s, resulting in problems like pregnant women being denied Medicaid. Similar issues in California, Citron writes in a paper, led to “overpayments, underpayments, and improper terminations of public benefits,” as foster children were incorrectly denied Medicaid. Citron writes about the need for “technological due process” — the importance of both understanding what’s happening in automated systems and being given meaningful ways to challenge them.

  • - Probably one of the most discussed cases of algorithmic bias is the apparent racial discrimination by the COMPAS recidivism prediction tool, that was reported on by ProPublica in 2016.
    - COMPAS is a tool used across the US by judges and parole officers to predict recidivism rates based on responses to a long questionnaire.
    - There are many issues that can be discussed regarding the use of this tool. Here we will only look at the central question: does COMPAS produce racially biased predictions? Highlighting the fact that there is no one-true answer to this question.
  • When evaluating if COMPAS is racially biased there are a number of different error rates that can be considered.
    As shown in figure 1, one can compare the Overall Accuracy, or Misclassification Rates for white and black defendants.
    Alternatively one could look at the False Negative or False Positive rates, i.e. the rate with which classification errors correspond to predictions of defendants being less or more likely to reoffend.
    The developers of COMPAS focused on Accuracy rate, whereas the ProPublica investigation looked at False Positive and False Negative rates.
    Figure 2, red box, shows that COMPAS (as well as human judges) has similar Accuracy rates for Black and White defendants, suggesting that it is not biased.
    Figure 2, green box, shows that when COMPAS makes errors, black defendants get judged more harshly than they should and white defendants more leniently.
    Mathematically is can be shown that when the base rates differ between two groups, white and black offenders, that can simultaneously avoid bias in Accuracy, False negative and False Positive rates.
  • Institute of Electrical and Electronics Engineers (IEEE) is the world's largest association of technical professionals with more than 423,000 members in over 160 countries around the world
    IEEE Standards Association (IEEE-SA) is responsible for standards such as the wifi 802.11 standard which is what all wifi devices use so that they can communicate with each other
  • FAT/ML (Fairness, Accountability and Transparency in Machine Learning) is a group of leading academic working on Fairness, Accountability and Transparency in Machine Learning with annual conferences highlighting leading work in this area.
  • Ansgar Koene is a Senior Research Fellow at Horizon Digital Economy Research institute, University of Nottingham and chairs the IEEE P7003 Standard for Algorithm Bias Considerations working group. As part of his work at Horizon Ansgar is the lead researcher in charge of Policy Impact; leads the stakeholder engagement activities of the EPSRC (UK research council) funded UnBias project to develop regulation-, design- and education-recommendations for minimizing unintended, unjustified and inappropriate bias in algorithmic systems; and frequently contributes evidence to UK parliamentary inquiries related to ICT and digital technologies.
    Ansgar has a multi-disciplinary research background, having previously worked and published on topics ranging from bio-inspired Robotics, AI and Computational Neuroscience to experimental Human Behaviour/Perception studies.
  • Each method has its strengths and weaknesses. The following examples provide a brief overview of how these methods work, and how they can go wrong
  • Machine Learning based systems require large, reliable, ‘ground truth’ data sets to teach them.
    Often the easiest solution is to use existing ‘historical’ data.
    Want to teach a system who to target for high-paying job ads? Start with a database of people with high paying jobs, and look for defining feature. Due to historical bias the result will favour white males.
    Want to use AI to judge a beauty contest? Train it with images of actors and models, and watch it reproduce fashion industry bias for light skin tones.
    To avoid reproducing human bias in Machine Learning systems it is necessary to carefully check the training data and possibly adjust sampling rates for various population groups, compensate by adjusting the weight given to errors for different groups, or using other methods.