6. Why AI projects fail
6
POOR DATA QUALITY
SILOS BETWEEN TECH
AND BUSINESS TEAMS
LACK OF SUITABLE
TALENTS TO HANDLE DATA
7. Access to
excellent data
is critical to the
success of an AI
project
7
A 10% reduction in label
accuracy leads to a 2%-5%
decrease in model accuracy,
and the amount of labeled
data needs to be doubled
in order to not impact model
performance.
9. Challenges of data labeling
Many stakeholders with different backgrounds
9
Data scientist
Project manager
ML Ops engineer
Subject matter experts
Data quality engineers
Outsourced labeling workforce
Create a common understanding
1
Set the right communication tools
2
Provide the right training
3
10. Challenges of data labeling
Complex project
management
1
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
13. Build the team
13
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
Machine
Labeler
Reviewer
ML
Engineer
14. Labeling
In-house VS Outsourcing
14
In-house
Outsourced
workforce
Strengths ✓ Collaboration
✓ Sensitive data
management
✓ Subject matter
expertise
✓ Cost
✓ Labeling
expertise
✓Flexibility
Weaknesses ✗ Distraction from
other tasks
✗ Expensive
✗ Weak expertise
on labeling
✗ Need for training
✗ Collaboration
✗ Lack of context
for quality labeling
✗ Data sensitivity
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
15. Projects requirements
❏ Metrics to measure progress
❏ Dataset that needs to be labeled
❏ Quality requirements
❏ Acceptable estimated margin of error
❏ Workflow (level of consensus, w/o review, …)
❏ Business knowledge reusable for the
pre-labeling / quality check
❏ Dictionaries
❏ Features of objects to labels
❏ Ontology
❏ Create different categories of labels
❏ Tools (bbox vs semantic, vs …)
❏ Instructions
❏ Level of detail with which the data should be labeled
❏ Edge cases
❏ Split
15
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
16. Task distribution
Assign assets to specific skills
16
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
TASK
DISTRIBUTION
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
17. Automated labeling
VS Manual labeling
17
BUSINESS
OBJECTIVES
BUILD
THE TEAM
PROJECT
REQUIREMENTS
MANUAL
LABELING
AUTOMATED
LABELING
QUALITY
MANAGEMENT
PERFORMANCE
REVIEW
EXPORT TRAINING
DATASET
MONITOR
PERFORMANCE
TASK
DISTRIBUTION
Rules-based
automation
Manual
labeling
Model
pre-labeling
20%
60%
20%
21. Take-aways
21
Aim for data excellence, not volume
Labeling is a tradeoff between cost and quality. The ideal trade-off
is found with a gradual step by step approach > Data-Centric AI
Quality is a dialogue