Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf

Best practices for Project Management
& Collaboration in Data Labeling
1
Webinar, June 21st

Today’s speakers
2
Edouard, Co-founder & CTO Nicolas, CMO

New York Paris
Training Data Platform

1 - The importance of quality data labeling
2 - Best practices for project management and collaboration
3 - Key take-aways
Agenda
4

The importance
of quality data
labeling
5

Why AI projects fail
6
POOR DATA QUALITY
SILOS BETWEEN TECH
AND BUSINESS TEAMS
LACK OF SUITABLE
TALENTS TO HANDLE DATA

Access to
excellent data
is critical to the
success of an AI
project
7
A 10% reduction in label
accuracy leads to a 2%-5%
decrease in model accuracy,
and the amount of labeled
data needs to be doubled
in order to not impact model
performance.

Data
labeling
in a ML
project
8

Challenges of data labeling
Many stakeholders with diﬀerent backgrounds
9
Data scientist
Project manager
ML Ops engineer
Subject matter experts
Data quality engineers
Outsourced labeling workforce
Create a common understanding
1
Set the right communication tools
2
Provide the right training
3

Challenges of data labeling
Complex project
management
1
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE

Best practices for
project management
& collaboration
11

Gradual approach
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING

Build the team
13
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Machine
Labeler
Reviewer
ML
Engineer

Labeling
In-house VS Outsourcing
14
In-house
Outsourced
workforce
Strengths ✓ Collaboration
✓ Sensitive data
management
✓ Subject matter
expertise
✓ Cost
✓ Labeling
expertise
✓Flexibility
Weaknesses ✗ Distraction from
other tasks
✗ Expensive
✗ Weak expertise
on labeling
✗ Need for training
✗ Collaboration
✗ Lack of context
for quality labeling
✗ Data sensitivity
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE

Projects requirements
❏ Metrics to measure progress
❏ Dataset that needs to be labeled
❏ Quality requirements
❏ Acceptable estimated margin of error
❏ Workﬂow (level of consensus, w/o review, …)
❏ Business knowledge reusable for the
pre-labeling / quality check
❏ Dictionaries
❏ Features of objects to labels
❏ Ontology
❏ Create diﬀerent categories of labels
❏ Tools (bbox vs semantic, vs …)
❏ Instructions
❏ Level of detail with which the data should be labeled
❏ Edge cases
❏ Split
15
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE

Task distribution
Assign assets to speciﬁc skills
16
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE

Automated labeling
VS Manual labeling
17
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
Rules-based
automation
Manual
labeling
Model
pre-labeling
20%
60%
20%

Quality management
18
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
- Instructions: should be understood by
everyone + concentric approach
- Needs to be measured (consensus,
Honeypot…)
- Quality is a dialogue
- UX should be designed to avoid
mistakes
- Rule-based error checking
- A disagreement is a signal

Train the model, monitor
performance & iterate
19
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
Reach
consensus
Use review
Update
instructions

Take-aways
21
Aim for data excellence, not volume
Labeling is a tradeoﬀ between cost and quality. The ideal trade-oﬀ
is found with a gradual step by step approach > Data-Centric AI
Quality is a dialogue

Thank you
22
Kili-technology.com

Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf

Semelhante a Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf (20)

Último

Último (20)

Kili-Technology_Webinar Project Management & Collaboration in Data Labeling_June 2022.pdf