The digital era has presented big challenges, but also great opportunities for the museum world. One of these opportunities is the way museums can open up their collections to the public. Many museums are now actively exploring possibilities to present their collections online for visitors who cannot come to the museum, or to show objects for which they do not have space in the exhibition halls. Often they will put together themed Web sites for online exhibitions in which objects are presented in a certain context. However, these themed Web sites usually only cover only a small part of their collection. For the majority of the objects, the context is not made explicit. In the Agora project, we aim to make this context explicit in an automatic way in order to help users understand and interpret museum objects. We do this by linking museum objects to historical events and explicitly presenting these links in an event-driven browsing environment.
In the first part of my talk, I will explain the theoretical framework we have developed in the Agora project to represent historical contexts as well as the general challenges to the project. In the second part of my talk, I will focus on the particular challenges in information extraction for building the event thesaurus and linking museum objects.
These slides are from a presentation given at the Eurecom seminar on July 20 2012
Agora: putting museum objects into their art-historic context
1. Agora: putting museum objects
into their art-historic context
Marieke van Erp
marieke@cs.vu.nl
EURECOM July 2012
2. Introduction
• BA, MA & PhD
Computational Linguistics/
Information Extraction
@Tilburg University
• Since 2009: SemWeb group
@VU University Amsterdam
3. Overview
• The Agora Project
• Digital Hermeneutics
• Building an Event Thesaurus
for Dutch
• Experiments & Results
• Outlook
Image src: http://www.artrage.com.au/dreamgirl/filesend/223/
EarthFromAbove_EXPOTVDC212_prog.jpg
4. The Agora Project
• Collaboration VU CS &
History departments,
Netherlands Institute for
Sound and Vision and
Rijksmuseum Amsterdam
• Facilitate and investigate
digitally mediated public
history
5. Digitising Heritage
• Galleries, libraries, archives and
museums (GLAMS) are digitising
their data and presenting it online
• This changes the role of GLAMS
from information interpreters to
information providers
• In the online setting, objects can
easily start to lead their own lives
Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
6.
7. Digital Hermeneutics
• An object on its own has no
meaning; event descriptions
provide historical context
• A single event only gives part
of the historical context;
chains of events (narratives)
provide a more complete
overview
Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/
IoPReKrojkY/s1600/42st.jpg
8. Event Dimension
19/12/1948
rma:creationDate
sem:hasBeginTimeStamp sem:hasBeginTimeStamp
sem:Actor sem:Actor
rdf:type rdf:type
Netherlands rma:maker Mohammed
Toha
Painting: Three Fighter Aircraft in the Sky
sem:
sem:
rma:creationPlace hasActor
hasActor
agora:depictsEvent agora:createsEvent
Yogyakarta
sem:hasPlace Mohammed Toha
sem:Event rdf:type
The Attack on sem:hasPlace rdf:type
Paints "Three Fighter rdf:type sem:Event
Yogyakarta Aircraft in the Sky"
sem:Place
9. Narratives 1945 - 1946
Armed sem:hasTimeStamp
Conflict
sem:
eventType
The Attack on
Yogyakarta
sem:hasPlace
Indonesia
sem:hasActor
KNIL
agora:hasBiographicalRelation
19/12/1948 - 31/12/1948
Armed sem:hasTimeStamp
Conflict
sem:
eventType
Operation
Crow
sem:hasPlace Sumatra
sem:hasActor
KNIL
agora:hasBiographicalRelation
01/03/1949
sem:hasTimeStamp
Attack
sem:
eventType
The Attack on
Yogyakarta
sem:hasPlace
Yogyakarta
sem:hasActor
KNIL
13. Building an Event Thesaurus
• There are no extensive structured
event descriptions
• Rijksmuseum Amsterdam has a
flat list of 1,693 ‘events’: only
names and very much focused on
17th century Holland
• Our goal:
• create a list of historically
relevant events
• provide actors, locations,
times & types for each event
Image src: http://www.collinsdictionary.com/static/graphics/default.png
14. First Attempt
• Pattern based event-name
extraction
• In Dutch Wikipedia we
found 2,444 event
candidates
• 1209 (56.3%) correct
• 169 (13.9%) partially
correct
• Off-the-shelf named entity
recognition (P/R/F1)
• Person 77/77/77
• Location 75/58/66
• Organisation 32/37/34
Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
%205.jpg
15. First Attempt
• Co-occurrence based event-
relation finder
• only actor, location and/
or date found for 392
events
• 49.6% actor is correct
• 41.1% location is correct
• 51.5% date is correct
Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
%205.jpg
16. First Attempt
• Problems event element recognition:
• Shallow grammatical
processing (post-war rebuilding
and during the North sea flood
recognised as 1 event)
• Missing locations (Battle of
LOC pattern fails)
• No distinction between
entities and action nouns
(German Occupancy vs German
Occupants look the same for
the approach)
• Named Entity Recogniser not
suited for domain
Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
%205.jpg
17. First Attempt
• Problems event relation
finder:
• Relies on redundancy in
the data, only works for
‘popular’ events
• Too coarse-grained (who
were the actors/locations
in WWII)
• Evaluation is hard!
Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
%205.jpg
18. Back to the drawing board...
• Analysis of event names
• Combinations of sortal nouns with
a PP and a named entity e.g., Battle
of Stalingrad, Death of John Lennon
• Combinations of nominalized verbs
with a PP and a named entity e.g,
Excavation of Troy, Election of
Obama.
• Combinations of a referential
adjective with an event type and
named entity e.g., the American
invasion of Iraq.
• Transparent proper names: Great
War
• Opaque proper names: Event
names that can not be decomposed
on morphological grounds e.g.,
Holocaust, Spanish Fury
Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
molinotrashfire10.jpg
19. Back to the drawing board...
• Improve Named Entity
Recognition
• Add gazetteers for
historical names
• Post-processing for titles
and improved NE
boundaries
Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
molinotrashfire10.jpg
20. Back to the drawing board...
• Finding Event Relations
• Use structure Wikipedia/
DBpedia
• Shallow parsing
• Hierarchies of actors &
locations
Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
molinotrashfire10.jpg
21. Current Work
Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1)
Person 54.05/7.52/13.20 58.60/34.46/43.40 79.17/71.16/74.95
Location 64.52/30.77/41.67 67.19/66.15/66.67 80.00/61.54/69.57
Organisation 0/0/0 9.78/25.71/14.17 89.66/74.29/81.25
• Still some work to be done, but
Freire et al. (2012) shows that smart
features can work with small amounts
of training data
• Combine classifiers
• Add post-processing
• MISC Class remains to be done...
22. Current Work
Word POS CHUNK NER
U.N. NNP I-NP I-ORG
official NN I-NP O
Ekeus NNP I-NP I-PER
heads VBZ I-VP O
for IN I-PP O
Baghdad NNP I-NP I-LOC
. . O O [CoNLL2003]
focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class
"is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O"
"painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O"
"dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O"
"grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O"
".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O"
"William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER"
"Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER"
"made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O"
"many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O"
"telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O"
[Freire et al. 2012]
23. Current Work
• Build smarter extractors for
event names
• First focus on ‘regular’
event names (e.g., Battle
of LOC, War of YEAR)
• Use knowledge about
action nouns vs static
nouns (WordNet)
24. The Story So Far
• It takes time to learn to
communicate in an
interdisciplinary project
• Don’t try to solve too much
in one go
• Cycles of error analysis
• Domain adaptation is difficult:
optimise for precision
25. Outlook
• Redesign of Agora demo (new
version autumn/winter)
• Include different perspectives
(together with Semantics of
History)
• Ship model use case
• Historical Named Entity
Recognition for English & Dutch
• 2nd round user studies (spring
2013)