Automation Ops Series: Session 2 - Governance for UiPath projects
Harnessing Human Semantics at Scale (updated)
1. http://lora-aroyo.org @laroyo
Harnessing Human Semantics at Scale
Measurable, Reproducible, Engaging, Sustainable
Crowdsourcing & Nichesourcing
Lora Aroyo
Join to participate in the CATS4ML Data Challenge
cats4ml.humancomputation.com
17. http://lora-aroyo.org @laroyo
in the (very near) future
most visitors will be digital-born
not bound by time or location
native to new forms of co-makership
native to new media
Siebe Weide, Max Meijer and Marieke Krabshuis (2012).
Agenda 2026: Study on the Future of the Dutch Museum Sector
25. http://lora-aroyo.org @laroyo
expertise of Rijksmuseum professionals is
in annotating their collection
with art-historical information, e.g. when they
were created, by whom, etc.
29. http://lora-aroyo.org @laroyo
Engage with Games
training the general crowd to be a niche:
game in which players can carry out an expert
annotation tasks with some assistance
39. http://lora-aroyo.org @laroyo
Low reproducibility rates
Difficult to estimate & control the time to complete
Difficult to assess & compare quality
Demands continuous promotional effort
Active learning (human-in-the-loop) needs different expertise
Difficult to incorporate results into existing content infrastructure
Challenges
Crowdsourcing typically undertaken in isolation
46. Web & Media Group
http://lora-aroyo.org @laroyo
simplification of context
this all results in
47. Web & Media Group
http://lora-aroyo.org @laroyo
48. http://lora-aroyo.org @laroyo
● Identify Crowdsourcing Goals through user log analysis
○ # queries, #unique queries, #queries of specific type
○ ranked by popularity
○ ranked by popularity and with error, e.g.
■ # queries entered over 50 times with 0 results
■ # queries of specific type with 0 results
○ which will have biggest impact
○ which has biggest urgency
● … or through other user analysis
○ museum visits, external channels
Assess Impact of Results
50. http://lora-aroyo.org @laroyo
people search for fragments
experts annotate full videos
35% of search queries result in not found
people search for fragments
experts annotate full videos
35% of search queries result in not found
for example
in video search
52. http://lora-aroyo.org @laroyo
Measure Quality
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
time-based annotation
bernhard
88% of the tags useful
for specific genres
describe short segments
often not very specific
don’t describe program as a whole
53. http://lora-aroyo.org @laroyo
for example
in video search
video annotation is time-consuming
5 times the video duration
experts use a specific vocabulary
that is unknown to general audiences
video annotation is time-consuming
5 times the video duration
experts use a specific vocabulary
that is unknown to general audiences
54. http://lora-aroyo.org @laroyo
user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
persons (31%)
objects (57%)
Measure Quality
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
55. Web & Media Group
http://lora-aroyo.org @laroyo
human subjectivity, ambiguity & uncertainty of expression
natural part of human semantics
56. http://lora-aroyo.org @laroyo
measure quality
quality is not just about spam
quality is typically multi-dimensional
understand the diversity in crowd answers
do not ignore multitude of interpretations
understand the variety of contexts
identify cases with high ambiguity, similarity, …
experiment with explicit metrics
experiment with different designs
57. http://lora-aroyo.org @laroyo
Measure Progress
6 months 2 years
340,551 tags 36,981 tags
137.421 matches
602 items 1.782 items
555 registered players 2,017 users (taggers)
thousands of anonymous players
12,279 visits (3+ min online)
44,362 pageviews
Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo (2011).
On the role of user-generated metadata in audio visual collections. International conference
on Knowledge capture K-CAP '11, Pages 145-152
64. http://lora-aroyo.org @laroyo
Your AI model is as good
as your evaluation data
… but is your evaluation
data missing relevant
examples?
… and how can we find
such examples, especially
if they are AI blindspots
(i.e. unknown unknowns)?
CATS4ML Challenge
offers a crowdsourced red
team for finding
blindspots of your AI
models
cats4ml.humancomputation.com
65. http://lora-aroyo.org @laroyo
AI Blindspots
real images with visual patterns that confuse AI models
in ways humans might find meaningful
Lipstick?
Airplane?
Car?
Construction worker? Thanksgiving?
https://opensource.google/projects/open-images-dataset
66. http://lora-aroyo.org @laroyo
Inspired by Bug Bounty
this is a data challenge to find
the blindspots in our AI models
Challenge running
until mid Jan 2021
Join, and start hunting
for AI Blindspots, and
spread the word to other
teams that might be
interested!
cats4ml.humancomputation.com
cats4ml.humancomputation.com