He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

•

0 gostou•172 visualizações

Yves Peirsman presents several instances where bias has posed a risk to the successful adoption of NLP systems, and discusses what techniques exist to discover these biases before the systems are put in production.

Tecnologia

Finding and Fixing Bias
in Natural Language Processing
Yves Peirsman

Artificial Intelligence
Natural Language Processing
A primer in NLP
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification

We provide consultancy
for companies that need
guidance in the NLP domain
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.

Training data Training process Model
We integrate
models with
workflows.
NLP Town
We help annotate
training data.
We train models
for NLP
applications.
We provide consultancy
for NLP projects.

A primer in NLP
Training data Training process Model

Word Embeddings
Word embeddings allow NLP models to generalize better.

Word Embeddings
Word embeddings capture both general and linguistic knowledge.

Word Embeddings
Word embeddings also encode bias:
● Man is to king as woman is to ___.
● Man is to programmer as woman is to ___.
Experiment:
● Measure the similarity between occupations and
○ A set of “male” words: man, son, father, he, him, etc.
○ A set of “female” words: woman, daughter, mother, she, her, etc.

Pretrained NLP models
Pretrained language models are a recent significant breakthrough in NLP:
● Language models predict masked words.
● They learn a lot about language.
● This knowledge can be reused in “downstream” tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.

Pretrained NLP models
ULMFit, Howard and Ruder 2018

Pretrained language models
Experiment: association with a large number
of positive adjectives
● One of the several recent Dutch Bert
models
● Association between 240 positive
adjectives and hij/zij:
○ aantrekkelijk, ambitieus, intelligent,
slim, knap, nauwkeurig,
nieuwsgierig, etc.

Step 1: Identify bias with explainable AI
Challenge
● First we need to find out our models are biased: search for known, but also
unexpected bias
● An important role for explainable AI
Experiment
● A simple classifier for toxic comments
● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a
hole go to hell!"

Step 1: Identify bias with explainable AI
● Visualize the classifier features and their weights:

Step 1: Identify bias with explainable AI

Step 2: Fixing and avoiding bias
Training data Training process Model

Training data Training process Model
Ensure the training
data is free of bias.
Step 2: Fixing and avoiding bias

Bias in annotation
Inform annotators about possible confounding factors, such as dialect.
● Example: if people are informed that a tweet contains African American
English dialect, they are less likely to label it as offensive (Sap et al. 2019)
Bias in text
● If you create a new corpus, ensure your texts contain as little bias as
possible.
● If you use existing data, try mitigating biases through data
augmentation, over- and/or undersampling, etc.
Step 2: Fixing and avoiding bias

Training data Training process Model
Pick a training
procedure that
makes the system
blind to bias.
Step 2: Fixing and avoiding bias

Adversarial training
Train your model to shine at your task, but to fail at
predicting “protected variables”, such as gender or race.
ModelCV
Step 2: Fixing and avoiding bias

Training data Training process Model
Change the
weights of the
model so that the
bias is reduced.
Step 2: Fixing and avoiding bias

Word embeddings
Transform the embeddings so that bias is removed.
Pre-trained models
Fine-tune on non-biased data, so that the models “forget” their bias.
Step 2: Fixing and avoiding bias

None of these methods are foolproof:
● You need to be aware of the bias before you can remove it
● Often only “superficial” bias is removed, but deeper bias remains (Honen
and Goldberg 2019)
As AI developers, it is our responsibility to deploy our system in such a way that
potentially harmful side effects are minimized.
● Effective feedback loops
● Human-in-the-loop AI
Step 2: Fixing and avoiding bias

http://www.nlp.town yves@nlp.town
Thanks! Questions?

Mais conteúdo relacionado

Mais procurados

How do we train AI to be Ethical and Unbiased?Mark Borg

Bias in AI-systems: A multi-step approachEirini Ntoutsi

Generative AI Risks & ConcernsAjitesh Kumar

HOW AI CAN HELP IN CYBERSECURITYPriyanshu Ratnakar

The Ethics of AI in EducationMark S. Steed

Governance of trustworthy AIsamossummit

Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi

Artificial Intelligence in Educationaiax

Ai in Higher EducationStephen Murgatroyd, PhD FBPsS FRSA

Ethical issues facing Artificial IntelligenceRah Abdelhak

Machine Intelligence and Moral Decision-MakingBohyun Kim

Panel: AI for Social Good - Fairness, Ethics, Accountability, and TransparencyAmazon Web Services

Artificial Intelligence and BiasOleksandr Krakovetskyi

Data and Ethics: Why Data Science Needs OneTim Rich

Toward Trustworthy AINozha Boujemaa

Responsible AINeo4j

DIGITAL-PERSONAL-DATA-PROTECTION-ACT-2023-WHITEPAPER.pdfDaviesParker

AI Governance and Ethics - Industry StandardsAnsgar Koene

Ethics of Analytics and Machine LearningMark Underwood

Ethical Issues in Machine Learning Algorithms (Part 2)Vladimir Kanchev

Mais procurados (20)

How do we train AI to be Ethical and Unbiased?

Bias in AI-systems: A multi-step approach

Generative AI Risks & Concerns

HOW AI CAN HELP IN CYBERSECURITY

The Ethics of AI in Education

Governance of trustworthy AI

Fairness and Privacy in AI/ML Systems

Artificial Intelligence in Education

Ai in Higher Education

Ethical issues facing Artificial Intelligence

Machine Intelligence and Moral Decision-Making

Panel: AI for Social Good - Fairness, Ethics, Accountability, and Transparency

Artificial Intelligence and Bias

Data and Ethics: Why Data Science Needs One

Toward Trustworthy AI

Responsible AI

DIGITAL-PERSONAL-DATA-PROTECTION-ACT-2023-WHITEPAPER.pdf

AI Governance and Ethics - Industry Standards

Ethics of Analytics and Machine Learning

Ethical Issues in Machine Learning Algorithms (Part 2)

Semelhante a He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar

Reflective Plan ExamplesMonica Turner

What can Natural Language Processing do for you?Yves Peirsman

DataScientist Job : Between Myths and Reality.pdfJedha Bootcamp

ConveyUX Elegant Precisionlaurentgc

Fine-tuning Pre-Trained Models for Generative AI ApplicationsBenjaminlapid1

Clark ch 8 and 9Christian King

How to fine-tune and develop your own large language model.pptxKnoldus Inc.

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays

Clark ch 8 and 9Christian King

ChatGPT in academic settings H2.deDavid Döring

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docxcroysierkathey

Babak Rasolzadeh: The importance of entitiesZoltan Varju

Ai demystified for HR and TA leadersAntonia Macrides

E-Learning Balancing Act: Good vs Efficient development-web_version092010tmharpster

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...IL Group (CILIP Information Literacy Group)

Pair Programming with a Large Language ModelKnoldus Inc.

[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...Pedro Henriques

Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptxD2L Barry

Semelhante a He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup

Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman

Reflective Plan Examples

What can Natural Language Processing do for you?

DataScientist Job : Between Myths and Reality.pdf

ConveyUX Elegant Precision

Fine-tuning Pre-Trained Models for Generative AI Applications

Clark ch 8 and 9

How to fine-tune and develop your own large language model.pptx

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Clark ch 8 and 9

ChatGPT in academic settings H2.de

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx

Babak Rasolzadeh: The importance of entities

Ai demystified for HR and TA leaders

E-Learning Balancing Act: Good vs Efficient development-web_version092010

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...

Pair Programming with a Large Language Model

[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...

Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptx

Mais de Patrick Van Renterghem

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...Patrick Van Renterghem

Implementing error-proof, business-critical Machine Learning, presentation by...Patrick Van Renterghem

Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...Patrick Van Renterghem

AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...Patrick Van Renterghem

Responsible AI: An Example AI Development Process with Focus on Risks and Con...Patrick Van Renterghem

Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...Patrick Van Renterghem

How obedient digital twins and intelligent beings contribute to ethics and ex...Patrick Van Renterghem

Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...Patrick Van Renterghem

Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...Patrick Van Renterghem

Digital Workplace Case Study: How the Municipality of Duffel successfully swi...Patrick Van Renterghem

Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...Patrick Van Renterghem

The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...Patrick Van Renterghem

Engie's Digital Workplace and "Connecting the company" business case, present...Patrick Van Renterghem

Face your communication challenges when implementing a digital workplace, bas...Patrick Van Renterghem

The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...Patrick Van Renterghem

Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...Patrick Van Renterghem

Tim scottkoenverheyenpresentationPatrick Van Renterghem

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Patrick Van Renterghem

Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Patrick Van Renterghem

Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Patrick Van Renterghem

Mais de Patrick Van Renterghem (20)

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...

Implementing error-proof, business-critical Machine Learning, presentation by...

Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...

AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...

Responsible AI: An Example AI Development Process with Focus on Risks and Con...

Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...

How obedient digital twins and intelligent beings contribute to ethics and ex...

Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...

Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...

Digital Workplace Case Study: How the Municipality of Duffel successfully swi...

Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...

The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...

Engie's Digital Workplace and "Connecting the company" business case, present...

Face your communication challenges when implementing a digital workplace, bas...

The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...

Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...

Tim scottkoenverheyenpresentation

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...

Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...

Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...

Último

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

Sample pptx for embedding into website for demoHarshalMandlekar2

How to write a Business Continuity PlanDatabarracks

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

Take control of your SAP testing with UiPath Test SuiteDianaGray10

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

From Family Reminiscence to Scholarly Archive .Alan Dix

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

1. Finding and Fixing Bias in Natural Language Processing Yves Peirsman

2. Artificial Intelligence Natural Language Processing A primer in NLP Machine translation Sentiment analysis Information retrieval Information extraction Text classification

3. We provide consultancy for companies that need guidance in the NLP domain We develop software and train custom NLP models for challenging or domain-specific applications.

4. Training data Training process Model We integrate models with workflows. NLP Town We help annotate training data. We train models for NLP applications. We provide consultancy for NLP projects.

5. Bias in Natural Language Processing

6. Bias in Natural Language Processing

7. A primer in NLP Training data Training process Model

8. A primer in NLP

9. Word Embeddings Word embeddings allow NLP models to generalize better.

10. Word Embeddings Word embeddings capture both general and linguistic knowledge.

11. Word Embeddings Word embeddings also encode bias: ● Man is to king as woman is to ___. ● Man is to programmer as woman is to ___. Experiment: ● Measure the similarity between occupations and ○ A set of “male” words: man, son, father, he, him, etc. ○ A set of “female” words: woman, daughter, mother, she, her, etc.

12. Word Embeddings

13. Pretrained NLP models Pretrained language models are a recent significant breakthrough in NLP: ● Language models predict masked words. ● They learn a lot about language. ● This knowledge can be reused in “downstream” tasks. This movie won her an Oscar for best actress. The keys to the house are on the table.

14. Pretrained NLP models ULMFit, Howard and Ruder 2018

15. Pretrained language models Experiment: association with a large number of positive adjectives ● One of the several recent Dutch Bert models ● Association between 240 positive adjectives and hij/zij: ○ aantrekkelijk, ambitieus, intelligent, slim, knap, nauwkeurig, nieuwsgierig, etc.

16. The problem with bias or

17. Step 1: Identify bias with explainable AI Challenge ● First we need to find out our models are biased: search for known, but also unexpected bias ● An important role for explainable AI Experiment ● A simple classifier for toxic comments ● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a hole go to hell!"

18. Step 1: Identify bias with explainable AI ● Visualize the classifier features and their weights:

19. Step 1: Identify bias with explainable AI

20. Step 1: Identify bias with explainable AI

21. Step 2: Fixing and avoiding bias Training data Training process Model

22. Training data Training process Model Ensure the training data is free of bias. Step 2: Fixing and avoiding bias

23. Bias in annotation Inform annotators about possible confounding factors, such as dialect. ● Example: if people are informed that a tweet contains African American English dialect, they are less likely to label it as offensive (Sap et al. 2019) Bias in text ● If you create a new corpus, ensure your texts contain as little bias as possible. ● If you use existing data, try mitigating biases through data augmentation, over- and/or undersampling, etc. Step 2: Fixing and avoiding bias

24. Training data Training process Model Pick a training procedure that makes the system blind to bias. Step 2: Fixing and avoiding bias

25. Adversarial training Train your model to shine at your task, but to fail at predicting “protected variables”, such as gender or race. ModelCV Step 2: Fixing and avoiding bias

26. Training data Training process Model Change the weights of the model so that the bias is reduced. Step 2: Fixing and avoiding bias

27. Word embeddings Transform the embeddings so that bias is removed. Pre-trained models Fine-tune on non-biased data, so that the models “forget” their bias. Step 2: Fixing and avoiding bias

28. None of these methods are foolproof: ● You need to be aware of the bias before you can remove it ● Often only “superficial” bias is removed, but deeper bias remains (Honen and Goldberg 2019) As AI developers, it is our responsibility to deploy our system in such a way that potentially harmful side effects are minimized. ● Effective feedback loops ● Human-in-the-loop AI Step 2: Fixing and avoiding bias

29. http://www.nlp.town yves@nlp.town Thanks! Questions?

He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

Semelhante a He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

Mais de Patrick Van Renterghem

Mais de Patrick Van Renterghem (20)

Último

Último (20)

He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town