SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Best practices for Project Management
& Collaboration in Data Labeling
1
Webinar, June 21st
Today’s speakers
2
Edouard, Co-founder & CTO Nicolas, CMO
New York Paris
Training Data Platform
1 - The importance of quality data labeling
2 - Best practices for project management and collaboration
3 - Key take-aways
Agenda
4
The importance
of quality data
labeling
5
Why AI projects fail
6
POOR DATA QUALITY
SILOS BETWEEN TECH
AND BUSINESS TEAMS
LACK OF SUITABLE
TALENTS TO HANDLE DATA
Access to
excellent data
is critical to the
success of an AI
project
7
A 10% reduction in label
accuracy leads to a 2%-5%
decrease in model accuracy,
and the amount of labeled
data needs to be doubled
in order to not impact model
performance.
Data
labeling
in a ML
project
8
Challenges of data labeling
Many stakeholders with different backgrounds
9
Data scientist
Project manager
ML Ops engineer
Subject matter experts
Data quality engineers
Outsourced labeling workforce
Create a common understanding
1
Set the right communication tools
2
Provide the right training
3
Challenges of data labeling
Complex project
management
1
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Best practices for
project management
& collaboration
11
Gradual approach
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
Build the team
13
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Machine
Labeler
Reviewer
ML
Engineer
Labeling
In-house VS Outsourcing
14
In-house
Outsourced
workforce
Strengths ✓ Collaboration
✓ Sensitive data
management
✓ Subject matter
expertise
✓ Cost
✓ Labeling
expertise
✓Flexibility
Weaknesses ✗ Distraction from
other tasks
✗ Expensive
✗ Weak expertise
on labeling
✗ Need for training
✗ Collaboration
✗ Lack of context
for quality labeling
✗ Data sensitivity
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Projects requirements
❏ Metrics to measure progress
❏ Dataset that needs to be labeled
❏ Quality requirements
❏ Acceptable estimated margin of error
❏ Workflow (level of consensus, w/o review, …)
❏ Business knowledge reusable for the
pre-labeling / quality check
❏ Dictionaries
❏ Features of objects to labels
❏ Ontology
❏ Create different categories of labels
❏ Tools (bbox vs semantic, vs …)
❏ Instructions
❏ Level of detail with which the data should be labeled
❏ Edge cases
❏ Split
15
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Task distribution
Assign assets to specific skills
16
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Automated labeling
VS Manual labeling
17
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
Rules-based
automation
Manual
labeling
Model
pre-labeling
20%
60%
20%
Quality management
18
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
- Instructions: should be understood by
everyone + concentric approach
- Needs to be measured (consensus,
Honeypot…)
- Quality is a dialogue
- UX should be designed to avoid
mistakes
- Rule-based error checking
- A disagreement is a signal
Train the model, monitor
performance & iterate
19
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
Reach
consensus
Use review
Update
instructions
Key take-aways
20
Take-aways
21
Aim for data excellence, not volume
Labeling is a tradeoff between cost and quality. The ideal trade-off
is found with a gradual step by step approach > Data-Centric AI
Quality is a dialogue
Thank you
22
Kili-technology.com

Mais conteúdo relacionado

Semelhante a Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf

Marie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business AnalystMarie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business Analyst
Marie Drahorad
 

Semelhante a Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf (20)

Give the People What They Want: An Approach to Thoughtful KM Technology
Give the People What They Want: An Approach to Thoughtful KM TechnologyGive the People What They Want: An Approach to Thoughtful KM Technology
Give the People What They Want: An Approach to Thoughtful KM Technology
 
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
 
Advancing Testing Program Maturity in your organization
Advancing Testing Program Maturity in your organizationAdvancing Testing Program Maturity in your organization
Advancing Testing Program Maturity in your organization
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Resume
ResumeResume
Resume
 
John Bustos (1)
John Bustos (1)John Bustos (1)
John Bustos (1)
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
 
Krishnan Ramachandran
Krishnan RamachandranKrishnan Ramachandran
Krishnan Ramachandran
 
How to Quickly Prototype a Scalable Graph Architecture: A Framework for Rapid...
How to Quickly Prototype a Scalable Graph Architecture: A Framework for Rapid...How to Quickly Prototype a Scalable Graph Architecture: A Framework for Rapid...
How to Quickly Prototype a Scalable Graph Architecture: A Framework for Rapid...
 
Resume_ChetanShetty
Resume_ChetanShettyResume_ChetanShetty
Resume_ChetanShetty
 
CV_Sanjay
CV_SanjayCV_Sanjay
CV_Sanjay
 
Marie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business AnalystMarie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business Analyst
 
Use Layered Model-Based Requirements to Achieve Continuous Testing
Use Layered Model-Based Requirements to Achieve Continuous TestingUse Layered Model-Based Requirements to Achieve Continuous Testing
Use Layered Model-Based Requirements to Achieve Continuous Testing
 
Venkata_Ramana_Sreepathi
Venkata_Ramana_SreepathiVenkata_Ramana_Sreepathi
Venkata_Ramana_Sreepathi
 
Jayanth_Resume
Jayanth_ResumeJayanth_Resume
Jayanth_Resume
 
Subham_Malakar
Subham_MalakarSubham_Malakar
Subham_Malakar
 
Sami patel full_resume
Sami patel full_resumeSami patel full_resume
Sami patel full_resume
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf

  • 1. Best practices for Project Management & Collaboration in Data Labeling 1 Webinar, June 21st
  • 3. New York Paris Training Data Platform
  • 4. 1 - The importance of quality data labeling 2 - Best practices for project management and collaboration 3 - Key take-aways Agenda 4
  • 5. The importance of quality data labeling 5
  • 6. Why AI projects fail 6 POOR DATA QUALITY SILOS BETWEEN TECH AND BUSINESS TEAMS LACK OF SUITABLE TALENTS TO HANDLE DATA
  • 7. Access to excellent data is critical to the success of an AI project 7 A 10% reduction in label accuracy leads to a 2%-5% decrease in model accuracy, and the amount of labeled data needs to be doubled in order to not impact model performance.
  • 9. Challenges of data labeling Many stakeholders with different backgrounds 9 Data scientist Project manager ML Ops engineer Subject matter experts Data quality engineers Outsourced labeling workforce Create a common understanding 1 Set the right communication tools 2 Provide the right training 3
  • 10. Challenges of data labeling Complex project management 1 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS TASK DISTRIBUTION MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE
  • 11. Best practices for project management & collaboration 11
  • 13. Build the team 13 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS TASK DISTRIBUTION MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE Machine Labeler Reviewer ML Engineer
  • 14. Labeling In-house VS Outsourcing 14 In-house Outsourced workforce Strengths ✓ Collaboration ✓ Sensitive data management ✓ Subject matter expertise ✓ Cost ✓ Labeling expertise ✓Flexibility Weaknesses ✗ Distraction from other tasks ✗ Expensive ✗ Weak expertise on labeling ✗ Need for training ✗ Collaboration ✗ Lack of context for quality labeling ✗ Data sensitivity BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS TASK DISTRIBUTION MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE
  • 15. Projects requirements ❏ Metrics to measure progress ❏ Dataset that needs to be labeled ❏ Quality requirements ❏ Acceptable estimated margin of error ❏ Workflow (level of consensus, w/o review, …) ❏ Business knowledge reusable for the pre-labeling / quality check ❏ Dictionaries ❏ Features of objects to labels ❏ Ontology ❏ Create different categories of labels ❏ Tools (bbox vs semantic, vs …) ❏ Instructions ❏ Level of detail with which the data should be labeled ❏ Edge cases ❏ Split 15 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS TASK DISTRIBUTION MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE
  • 16. Task distribution Assign assets to specific skills 16 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS TASK DISTRIBUTION MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE
  • 17. Automated labeling VS Manual labeling 17 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE TASK DISTRIBUTION Rules-based automation Manual labeling Model pre-labeling 20% 60% 20%
  • 18. Quality management 18 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE TASK DISTRIBUTION - Instructions: should be understood by everyone + concentric approach - Needs to be measured (consensus, Honeypot…) - Quality is a dialogue - UX should be designed to avoid mistakes - Rule-based error checking - A disagreement is a signal
  • 19. Train the model, monitor performance & iterate 19 BUSINESS OBJECTIVES BUILD THE TEAM PROJECT REQUIREMENTS MANUAL LABELING AUTOMATED LABELING QUALITY MANAGEMENT PERFORMANCE REVIEW EXPORT TRAINING DATASET MONITOR PERFORMANCE TASK DISTRIBUTION Reach consensus Use review Update instructions
  • 21. Take-aways 21 Aim for data excellence, not volume Labeling is a tradeoff between cost and quality. The ideal trade-off is found with a gradual step by step approach > Data-Centric AI Quality is a dialogue