SlideShare a Scribd company logo
1 of 26
Download to read offline
Agora: putting museum objects
into their art-historic context

          Marieke van Erp
          marieke@cs.vu.nl



             EURECOM July 2012
Introduction

• BA, MA & PhD
  Computational Linguistics/
  Information Extraction
  @Tilburg University

• Since 2009: SemWeb group
  @VU University Amsterdam
Overview

• The Agora Project
• Digital Hermeneutics
• Building an Event Thesaurus
  for Dutch
  • Experiments & Results
  • Outlook

                                Image src: http://www.artrage.com.au/dreamgirl/filesend/223/
                                EarthFromAbove_EXPOTVDC212_prog.jpg
The Agora Project

• Collaboration VU CS &
  History departments,
  Netherlands Institute for
  Sound and Vision and
  Rijksmuseum Amsterdam

• Facilitate and investigate
  digitally mediated public
  history
Digitising Heritage


•   Galleries, libraries, archives and
    museums (GLAMS) are digitising
    their data and presenting it online
•   This changes the role of GLAMS
    from information interpreters to
    information providers
•   In the online setting, objects can
    easily start to lead their own lives


                                           Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
Digital Hermeneutics


• An object on its own has no
    meaning; event descriptions
    provide historical context
•   A single event only gives part
    of the historical context;
    chains of events (narratives)
    provide a more complete
    overview
                                     Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/
                                     IoPReKrojkY/s1600/42st.jpg
Event Dimension
                                                                   19/12/1948

                                                                  rma:creationDate

                                   sem:hasBeginTimeStamp                                   sem:hasBeginTimeStamp

               sem:Actor                                                                                                    sem:Actor



                 rdf:type                                                                                                    rdf:type


            Netherlands                                                                                    rma:maker      Mohammed
                                                                                                                            Toha




                                                      Painting: Three Fighter Aircraft in the Sky

                                                                                                                                   sem:
                    sem:
                                                                 rma:creationPlace                                               hasActor
                  hasActor
                                      agora:depictsEvent                                      agora:createsEvent


                                                                   Yogyakarta
                                                                                     sem:hasPlace          Mohammed Toha
sem:Event   rdf:type
                            The Attack on     sem:hasPlace             rdf:type
                                                                                                         Paints "Three Fighter          rdf:type   sem:Event
                             Yogyakarta                                                                   Aircraft in the Sky"
                                                                     sem:Place
Narratives                                                 1945 - 1946
Armed                                               sem:hasTimeStamp
Conflict
                   sem:
                 eventType
                                 The Attack on
                                  Yogyakarta
                                                       sem:hasPlace
                                                                            Indonesia


                       sem:hasActor

          KNIL


                                           agora:hasBiographicalRelation




                                                                        19/12/1948 - 31/12/1948
Armed                                               sem:hasTimeStamp
Conflict
                   sem:
                 eventType
                                      Operation
                                        Crow
                                                       sem:hasPlace          Sumatra


                       sem:hasActor


          KNIL


                                            agora:hasBiographicalRelation




                                                                           01/03/1949
                                                    sem:hasTimeStamp
Attack
                   sem:
                 eventType
                                 The Attack on
                                  Yogyakarta
                                                       sem:hasPlace
                                                                            Yogyakarta

                       sem:hasActor

          KNIL
Event-driven Browsing
Event-driven Browsing
Event-driven Browsing
Building an Event Thesaurus

•   There are no extensive structured
    event descriptions
•   Rijksmuseum Amsterdam has a
    flat list of 1,693 ‘events’: only
    names and very much focused on
    17th century Holland
•   Our goal:
       • create a list of historically
           relevant events
       • provide actors, locations,
           times & types for each event
                                          Image src: http://www.collinsdictionary.com/static/graphics/default.png
First Attempt
•   Pattern based event-name
    extraction
       • In Dutch Wikipedia we
         found 2,444 event
         candidates
       • 1209 (56.3%) correct
       • 169 (13.9%) partially
         correct
•   Off-the-shelf named entity
    recognition (P/R/F1)
       • Person 77/77/77
       • Location 75/58/66
       • Organisation 32/37/34
                                 Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                 %205.jpg
First Attempt
• Co-occurrence based event-
  relation finder
     • only actor, location and/
        or date found for 392
        events
     • 49.6% actor is correct
     • 41.1% location is correct
     • 51.5% date is correct



                                   Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                   %205.jpg
First Attempt
• Problems event element recognition:
     • Shallow grammatical
         processing (post-war rebuilding
         and during the North sea flood
         recognised as 1 event)
     •   Missing locations (Battle of
         LOC pattern fails)
     •   No distinction between
         entities and action nouns
         (German Occupancy vs German
         Occupants look the same for
         the approach)
     •   Named Entity Recogniser not
         suited for domain
                                           Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                           %205.jpg
First Attempt
• Problems event relation
  finder:
    • Relies on redundancy in
      the data, only works for
      ‘popular’ events
    • Too coarse-grained (who
      were the actors/locations
      in WWII)
    • Evaluation is hard!

                                  Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                  %205.jpg
Back to the drawing board...
• Analysis of event names
     • Combinations of sortal nouns with
          a PP and a named entity e.g., Battle
          of Stalingrad, Death of John Lennon
      •   Combinations of nominalized verbs
          with a PP and a named entity e.g,
          Excavation of Troy, Election of
          Obama.
      •   Combinations of a referential
          adjective with an event type and
          named entity e.g., the American
          invasion of Iraq.
      •   Transparent proper names: Great
          War
      •   Opaque proper names: Event
          names that can not be decomposed
          on morphological grounds e.g.,
          Holocaust, Spanish Fury
                                                 Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                                 molinotrashfire10.jpg
Back to the drawing board...
• Improve Named Entity
  Recognition
    • Add gazetteers for
      historical names
    • Post-processing for titles
      and improved NE
      boundaries




                                   Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                   molinotrashfire10.jpg
Back to the drawing board...
• Finding Event Relations
     • Use structure Wikipedia/
        DBpedia
    •   Shallow parsing
    •   Hierarchies of actors &
        locations




                                  Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                  molinotrashfire10.jpg
Current Work
               Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1)
  Person        54.05/7.52/13.20    58.60/34.46/43.40   79.17/71.16/74.95

 Location       64.52/30.77/41.67   67.19/66.15/66.67   80.00/61.54/69.57

Organisation          0/0/0         9.78/25.71/14.17    89.66/74.29/81.25



      • Still some work to be done, but
      Freire et al. (2012) shows that smart
      features can work with small amounts
      of training data
      • Combine classifiers
      • Add post-processing
      • MISC Class remains to be done...
Current Work
                                      Word                      POS       CHUNK       NER
                                      U.N.                      NNP       I-NP        I-ORG
                                      official                  NN        I-NP        O
                                      Ekeus                     NNP       I-NP        I-PER
                                      heads                     VBZ       I-VP        O
                                      for                       IN        I-PP        O
                                      Baghdad                   NNP       I-NP        I-LOC
                                      .                         .         O           O     [CoNLL2003]
focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class
"is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O"
"painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O"
"dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O"
"grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O"
".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O"
"William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER"
"Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER"
"made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O"
"many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O"
"telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O"


                                                                                                   [Freire et al. 2012]
Current Work

• Build smarter extractors for
  event names
    • First focus on ‘regular’
       event names (e.g., Battle
       of LOC, War of YEAR)
    • Use knowledge about
       action nouns vs static
       nouns (WordNet)
The Story So Far

• It takes time to learn to
    communicate in an
    interdisciplinary project
•   Don’t try to solve too much
    in one go
•   Cycles of error analysis
•   Domain adaptation is difficult:
    optimise for precision
Outlook

• Redesign of Agora demo (new
    version autumn/winter)
•   Include different perspectives
    (together with Semantics of
    History)
•   Ship model use case
•   Historical Named Entity
    Recognition for English & Dutch
•   2nd round user studies (spring
    2013)
¿
                                                   ?                                                                           ?

                                                             ¿

                               Questions?

                                                ?
marieke@cs.vu.nl
                                                                                                                        ¿
http://www.cs.vu.nl/~marieke        Image src: http://www.rijksmuseum.nl/collectie/SK-A-2963/portret-van-don-ram%C3%B3n-satu
                                    Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg?
                                    %C3%A9-1765-1824
                                               __SQUARESPACE_CACHEVERSION=1295297003883

More Related Content

More from Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 

More from Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

Agora: putting museum objects into their art-historic context

  • 1. Agora: putting museum objects into their art-historic context Marieke van Erp marieke@cs.vu.nl EURECOM July 2012
  • 2. Introduction • BA, MA & PhD Computational Linguistics/ Information Extraction @Tilburg University • Since 2009: SemWeb group @VU University Amsterdam
  • 3. Overview • The Agora Project • Digital Hermeneutics • Building an Event Thesaurus for Dutch • Experiments & Results • Outlook Image src: http://www.artrage.com.au/dreamgirl/filesend/223/ EarthFromAbove_EXPOTVDC212_prog.jpg
  • 4. The Agora Project • Collaboration VU CS & History departments, Netherlands Institute for Sound and Vision and Rijksmuseum Amsterdam • Facilitate and investigate digitally mediated public history
  • 5. Digitising Heritage • Galleries, libraries, archives and museums (GLAMS) are digitising their data and presenting it online • This changes the role of GLAMS from information interpreters to information providers • In the online setting, objects can easily start to lead their own lives Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
  • 6.
  • 7. Digital Hermeneutics • An object on its own has no meaning; event descriptions provide historical context • A single event only gives part of the historical context; chains of events (narratives) provide a more complete overview Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/ IoPReKrojkY/s1600/42st.jpg
  • 8. Event Dimension 19/12/1948 rma:creationDate sem:hasBeginTimeStamp sem:hasBeginTimeStamp sem:Actor sem:Actor rdf:type rdf:type Netherlands rma:maker Mohammed Toha Painting: Three Fighter Aircraft in the Sky sem: sem: rma:creationPlace hasActor hasActor agora:depictsEvent agora:createsEvent Yogyakarta sem:hasPlace Mohammed Toha sem:Event rdf:type The Attack on sem:hasPlace rdf:type Paints "Three Fighter rdf:type sem:Event Yogyakarta Aircraft in the Sky" sem:Place
  • 9. Narratives 1945 - 1946 Armed sem:hasTimeStamp Conflict sem: eventType The Attack on Yogyakarta sem:hasPlace Indonesia sem:hasActor KNIL agora:hasBiographicalRelation 19/12/1948 - 31/12/1948 Armed sem:hasTimeStamp Conflict sem: eventType Operation Crow sem:hasPlace Sumatra sem:hasActor KNIL agora:hasBiographicalRelation 01/03/1949 sem:hasTimeStamp Attack sem: eventType The Attack on Yogyakarta sem:hasPlace Yogyakarta sem:hasActor KNIL
  • 13. Building an Event Thesaurus • There are no extensive structured event descriptions • Rijksmuseum Amsterdam has a flat list of 1,693 ‘events’: only names and very much focused on 17th century Holland • Our goal: • create a list of historically relevant events • provide actors, locations, times & types for each event Image src: http://www.collinsdictionary.com/static/graphics/default.png
  • 14. First Attempt • Pattern based event-name extraction • In Dutch Wikipedia we found 2,444 event candidates • 1209 (56.3%) correct • 169 (13.9%) partially correct • Off-the-shelf named entity recognition (P/R/F1) • Person 77/77/77 • Location 75/58/66 • Organisation 32/37/34 Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 15. First Attempt • Co-occurrence based event- relation finder • only actor, location and/ or date found for 392 events • 49.6% actor is correct • 41.1% location is correct • 51.5% date is correct Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 16. First Attempt • Problems event element recognition: • Shallow grammatical processing (post-war rebuilding and during the North sea flood recognised as 1 event) • Missing locations (Battle of LOC pattern fails) • No distinction between entities and action nouns (German Occupancy vs German Occupants look the same for the approach) • Named Entity Recogniser not suited for domain Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 17. First Attempt • Problems event relation finder: • Relies on redundancy in the data, only works for ‘popular’ events • Too coarse-grained (who were the actors/locations in WWII) • Evaluation is hard! Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 18. Back to the drawing board... • Analysis of event names • Combinations of sortal nouns with a PP and a named entity e.g., Battle of Stalingrad, Death of John Lennon • Combinations of nominalized verbs with a PP and a named entity e.g, Excavation of Troy, Election of Obama. • Combinations of a referential adjective with an event type and named entity e.g., the American invasion of Iraq. • Transparent proper names: Great War • Opaque proper names: Event names that can not be decomposed on morphological grounds e.g., Holocaust, Spanish Fury Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 19. Back to the drawing board... • Improve Named Entity Recognition • Add gazetteers for historical names • Post-processing for titles and improved NE boundaries Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 20. Back to the drawing board... • Finding Event Relations • Use structure Wikipedia/ DBpedia • Shallow parsing • Hierarchies of actors & locations Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 21. Current Work Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1) Person 54.05/7.52/13.20 58.60/34.46/43.40 79.17/71.16/74.95 Location 64.52/30.77/41.67 67.19/66.15/66.67 80.00/61.54/69.57 Organisation 0/0/0 9.78/25.71/14.17 89.66/74.29/81.25 • Still some work to be done, but Freire et al. (2012) shows that smart features can work with small amounts of training data • Combine classifiers • Add post-processing • MISC Class remains to be done...
  • 22. Current Work Word POS CHUNK NER U.N. NNP I-NP I-ORG official NN I-NP O Ekeus NNP I-NP I-PER heads VBZ I-VP O for IN I-PP O Baghdad NNP I-NP I-LOC . . O O [CoNLL2003] focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class "is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O" "painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O" "dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O" "grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O" ".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O" "William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER" "Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER" "made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O" "many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O" "telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O" [Freire et al. 2012]
  • 23. Current Work • Build smarter extractors for event names • First focus on ‘regular’ event names (e.g., Battle of LOC, War of YEAR) • Use knowledge about action nouns vs static nouns (WordNet)
  • 24. The Story So Far • It takes time to learn to communicate in an interdisciplinary project • Don’t try to solve too much in one go • Cycles of error analysis • Domain adaptation is difficult: optimise for precision
  • 25. Outlook • Redesign of Agora demo (new version autumn/winter) • Include different perspectives (together with Semantics of History) • Ship model use case • Historical Named Entity Recognition for English & Dutch • 2nd round user studies (spring 2013)
  • 26. ¿ ? ? ¿ Questions? ? marieke@cs.vu.nl ¿ http://www.cs.vu.nl/~marieke Image src: http://www.rijksmuseum.nl/collectie/SK-A-2963/portret-van-don-ram%C3%B3n-satu Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg? %C3%A9-1765-1824 __SQUARESPACE_CACHEVERSION=1295297003883