12. massive
amount
of
digital
content
to
explore
…
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
13. but
at
some
point
it
all
looks
the
same
…
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
14. Massive Scale:
A lifetime of video content is uploaded to YouTube everyday.
Granularity Mismatch:
Searching for the relevant video fragments is still not possible.
Passive Engagement:
Video is still primarily a linear net-time viewing activity
15. … people search & browse
with some implicit relevance in mind
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
18. there
is
huge
seman8c
&
cultural
GAP
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
19. so=ware
systems
are
ever
more
intelligent
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but
they
don’t
actually
understand
people
20. focus
on
human
knowledge
in
machine-‐readable
form
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but
there
are
types
of
human
knowledge
that
can’t
be
captured
by
machines
21. classical
AI
involves
human
experts
to
manually
provide
training
knowledge
for
machines
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
human
expert-‐based
ground
truth
does
not
scale
for
current
demand
for
machines
to
deal
with
wide
ranges
of
real-‐world
tasks
and
contexts
22.
we
need
to
be
able
to
….
support
of
mulGple
perspecGves
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
26. humans
accurately
perform
interpreta8on
tasks
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
can
their
effort
be
adequately
harnessed
in
a
scien8fically
reliable
manner
that
scales
across
tasks,
contexts
&
data
modali8es?
27. Quan8ty
is
the
new
Quality
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Human
Computa8on
adopts
human
intelligence
at
scale
to
improve
purely
machine-‐based
systems
28. diversity
of
opinion
Independent
decentralized
aggregated
James
Surowiecki
“the
wise
crowd”
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
29. a
novel
approach
to
gather
diversity
of
perspec8ves
&
opinions
from
the
crowd,
expand
expert
vocabularies
with
these
and
gather
new
type
of
gold
standard
for
machines
L.
Aroyo,
C.
Welty:
Crowd
Truth:
Harnessing
disagreement
in
crowdsourcing
a
rela?on
extrac?on
gold
standard.
ACM
WebSci
2013.
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
L.
Aroyo,
C.
Welty.
The
Three
Sides
of
CrowdTruth,
Journal
of
Human
Computa?on,
2014
http://CrowdTruth.org
http://data.CrowdTruth.org/
http://game.crowdtruth.org
30. Visual
Content
Domina8on
• 90%
of
informa8on
transmiSed
to
the
brain
is
visual
(processed
60,000X
faster
in
the
brain
than
text)
• Videos
increase
average
page
conversion
rates
by
86%
• Visuals
are
social-‐media-‐ready/friendly
-‐
easily
sharable
• Posts
with
visuals
receive
94%
more
page
visits
• Visuals
are
becoming
easier
and
easier
to
create
as
photo
/
video
ediGng
tools
become
more
accessible
31. any piece of media can be the starting point to
a world of compelling visual experiences.
turning “mute” images into content-aware images.
32. NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
METLIFE BUILDING
SUNSET
EAST RIVER
NEW YORK CITY
SKYSCRAPER
UPPER EAST SIDE
turning “mute” images into content-aware images.
any piece of media can be the starting point to
a world of compelling visual experiences.
33. combining machine processing with
crowdsourcing for enriching, curating &
gathering metadata
quickly & cheaply — at scale.
NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
METLIFE BUILDING
SUNSET
EAST RIVER
NEW YORK CITY
SKYSCRAPER
UPPER EAST SIDE
34. NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
NEW YORK CITY
SKYSCRAPER
METLIFE
BUILDING
UPPER EAST SIDE
EAST RIVER
MIDTOWN
MANHATTAN
PAN-AM BUILDING
PAN-AM AIRLINES HELICOPTER CRASH
AIR TRAVEL
ARCHITECTURE
turning “context-free” images in
relationship-aware images
35. NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
NEW YORK CITY
SKYSCRAPER
METLIFE
BUILDING
UPPER EAST SIDE
EAST RIVER
MIDTOWN
MANHATTAN
PAN-AM BUILDING
PAN-AM AIRLINES HELICOPTER CRASH
AIR TRAVEL
ARCHITECTURE
… not only images, but also for videos
YOUTUBE: NYC FROM THE
EMPIRE STATE BUILDING
allowing viewers to explore relationships across themes,
locations, characters, etc. — within a video.
41. BRIDGING THE GAP BETWEEN
PEOPLE & THE OVERWHELMING
AMOUNT OF ONLINE MULTIMEDIA CONTENT
42. HyperVideos:
Link video fragments in non-linear paths
Binging Engagement:
Construct continuous and interactive experiences
Video Snacks:
Break video down into snackable moments
SOLUTIONS
43. • Decomposing &
granular description
of images & videos.
• Constructing
mediaGraph with
rich media semantics.
• Continuously
enriching &
consolidating
machine, expert, &
user content
descriptions.
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
60. – The
first
6
months:
• 44.362
pageviews
• 12.279
visits
(3+
min
online)
• 555
registered
players
(thousands
anonymous
players!)
– 340.551
tags
added
to
602
items
– 137.421
matches
Results
of
First
Pilot
61. 11
PartcipaGng
Museums
1,782
Works
of
Art
in
the
Research
36,981
Tags
collected
2,017
Users
who
tagged
First
two
years
(2006-‐2008)
Q: Why did you tag?
0% 20% 40% 60% 80% 100%
don't remember
to connect with others
so that I could find works again later
other (please specify)
to learn about art
to improve search for other users
for fun
to help museums document art work
Public
MMA
62. Tags
by
Documentalists
• Tags
describe
mainly
short
segments
• Tags
are
oaen
not
very
specific
• Tags
not
describe
programmes
as
a
whole
• User
tags
were
useful
&
specific
-‐-‐>
domain
dependent
63. user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
persons (31%)
objects (57%)
On
the
Role
of
User-‐Generated
Metadata
in
A/V
Collec?ons
Riste
Gligorov
et
al.
KCAP
Int.
Conference
on
Knowledge
Capture
2011
Crowd
vs.
Professionals
64. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than consensus only
• Improvement of 53%
• Consensus tags have
• higher precision: 0.59 vs. 0.49
• but lower recall: 0.28 vs. 0.42
WAISDA?
Tags
vs.
Rest
65. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than rest
• Individually
• beat NCRV tags by 69%
• beat captions by 39%
WAISDA?
Tags
vs.
Rest
66. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than rest
• Individually
• beat NCRV tags by 69%
• beat captions by 39%
• Combined
• Improvement of 5%
WAISDA?
Tags
vs.
Rest
67. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All data performs best
• largely due to contribution of
user tags – 33%
WAISDA?
Tags
vs.
Rest
68. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than consensus only
• Improvement of 53%
• Consensus tags have
• higher precision: 0.59 vs. 0.49
• but lower recall: 0.28 vs. 0.42
All tags better than rest
• Individually
• beat NCRV tags by 69%
• beat captions by 39%
All data performs best
• largely due to contribution of
user tags – 33%
• Combined
• Improvement of 5%
WAISDA?
Tags
vs.
Rest
72. only a small fraction of about 8000 items
are currently on display
73. … online collection grows
125.000 artworks already available
another 40.000 are added every year
74. expertise of museum professionals is in
describing & annotating collection with art-
historical information, e.g. when they were
created, by whom, etc.
75. detailed information about depicted objects, e.g.
which species the animal or plant belongs to,
is in most cases not available
76. annotated only with “bird with blue head near
branch with red leaf”
species of the bird and the plant are missing
77. use crowdsourcing to get more annotations
use nichesourcing, i.e. niches of people with the
right expertise, to add more specific information
78. use sources like Twitter to find experts or
groups of experts on certain areas, e.g. bird
lovers, ornithologists or people who enjoy bird-
watching in their spare time
79. platform where users enter tags:
(1) structured vocabulary terms or (2) free text
hSp://annotate.accurator.nl
80. for tasks that are too difficult:
game in which players can carry out an expert
annotation task with some assistance
81. BIRDWATCHING RIJKSMUSEUM
Sunday October 4, 10.00 am - 14.00 pm
Cuypers Library Rijksmuseum
On World Animal Day, the Rijksmuseum will host a
birdwatching day in collaboration with Naturalis
Biodiversity Center, Wikimedia Netherlands and the
COMMIT/ SEALINCMedia project.
We are looking for bird watchers to join an expedi-
tion through the digital collections and help the
museums identify bird species in works of art.
82. dive.beeldengeluid.nl
In
Digital
Hermeneu8cs
Event-‐centric
Explora8on
@Sound
&
Vision
and
Royal
Library
3rd
Price
at
the
SemanGc
Web
Challenge
2014
83. OPENIMAGES.EU
• 3000
videos
• NL
InsGtute
for
Sound
&
Vision
• mostly
news
broadcasts
DELPHER.NL
• 1.5
Million
Scans
of
• Radio
bulleGns
• (hand
annotated)
• 1937
–
1984
84. Simple
Event
Model
(SEM)
OpenAnnota8on
(OA)
&
SKOS
DIVE:MEDIA OBJECT
SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
• LINKS
TO
EUROPEANA
(MULTILINGUAL)
• LINKS
TO
DBPEDIA
85. Digital
Submarine
UI
Infinity
of
Explora8on
Events
Linking
Objects
Crowd
Bringing
the
Human
Perspec8ves
Linked
(Open)
Data
86. En8ty
&
Event
Extra8on
with
CrowdTruth.org
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO
CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND
CONCEPTS TO KEYFRAMES
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
87. Erp,
M.
van;
Oomen,
J.;
Segers,
R.;
Akker,
C.
van
de;
Aroyo,
L.;
Jacobs,
G.;
Legêne,
S;
Meij,
L.
van
der;O
ssenbruggen,
J.R.
van;
Schreiber,
G.
AutomaGc
Heritage
Metadata
Enrichment
with
Historic
Events
Museums
and
the
Web
2011
h;p://www.museumsandtheweb.com/mw2011/
papers/automaGc_heritage_metadata_enrichment_with_hi
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
89. “Digital
HermeneuGcs:
Agora
and
the
online
understanding
of
cultural
heritage”
In
proc.
of
Web
Science
Conference,
(ACM:
New
York,
2011)
Interpreta8on
Support
for
Online
CollecGons
92. Links
from
the
slides
On
the
Web
• http://waida.nl
• http://prestoprime.org
• http://agora.cs.vu.nl
• http://sealincmedia.wordpress.com
• http://dive.beeldengeluid.nl
• http://diveplu.beeldengeluid.nl
• http://annotate.accurator.nl
• http://accurator.nl
• http://crowdtruth.org
• http://data.crowdtruth.org
• http://game.crowdtruth.org
• http://www.adweek.com/socialtimes/
millennials-love-video-on-mobile-social-
channels-infographic/622313
• http://www.blogherald.com/2010/10/27/
history-of-online-video/
• http://wm.cs.vu.nl
On
TwiSer
@waisda
@agora-‐project
@sealincmedia
@prestocenter
@vistatv
#CrowdTruth
#Accurator
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
93. Lecture
Reading
Material
h;p://www.aaai.org/ojs/index.php/aimagazine/arGcle/view/2564
Truth
Is
a
Lie:
Crowd
Truth
and
the
Seven
Myths
of
Human
AnnotaGon
h;ps://www.wired.com/2006/06/crowds/
THE
RISE
OF
CROWDSOURCING
h;ps://www.microsoa.com/en-‐us/research/project/algorithmic-‐crowdsourcing/
h;p://cci.mit.edu/publicaGons/CCIwp2011-‐04.pdf
Programming
the
Global
Brain
h;p://www.orchid.ac.uk/eprints/248/1/main.pdf
The
ACTIVECROWDTOOLKIT:
An
Open-‐Source
Tool
for
Benchmarking
AcGve
Learning
Algorithms
for
Crowdsourcing
Research
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo