SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Agora: putting museum objects
into their art-historic context

          Marieke van Erp
          marieke@cs.vu.nl



             EURECOM July 2012
Introduction

• BA, MA & PhD
  Computational Linguistics/
  Information Extraction
  @Tilburg University

• Since 2009: SemWeb group
  @VU University Amsterdam
Overview

• The Agora Project
• Digital Hermeneutics
• Building an Event Thesaurus
  for Dutch
  • Experiments & Results
  • Outlook

                                Image src: http://www.artrage.com.au/dreamgirl/filesend/223/
                                EarthFromAbove_EXPOTVDC212_prog.jpg
The Agora Project

• Collaboration VU CS &
  History departments,
  Netherlands Institute for
  Sound and Vision and
  Rijksmuseum Amsterdam

• Facilitate and investigate
  digitally mediated public
  history
Digitising Heritage


•   Galleries, libraries, archives and
    museums (GLAMS) are digitising
    their data and presenting it online
•   This changes the role of GLAMS
    from information interpreters to
    information providers
•   In the online setting, objects can
    easily start to lead their own lives


                                           Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
Digital Hermeneutics


• An object on its own has no
    meaning; event descriptions
    provide historical context
•   A single event only gives part
    of the historical context;
    chains of events (narratives)
    provide a more complete
    overview
                                     Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/
                                     IoPReKrojkY/s1600/42st.jpg
Event Dimension
                                                                   19/12/1948

                                                                  rma:creationDate

                                   sem:hasBeginTimeStamp                                   sem:hasBeginTimeStamp

               sem:Actor                                                                                                    sem:Actor



                 rdf:type                                                                                                    rdf:type


            Netherlands                                                                                    rma:maker      Mohammed
                                                                                                                            Toha




                                                      Painting: Three Fighter Aircraft in the Sky

                                                                                                                                   sem:
                    sem:
                                                                 rma:creationPlace                                               hasActor
                  hasActor
                                      agora:depictsEvent                                      agora:createsEvent


                                                                   Yogyakarta
                                                                                     sem:hasPlace          Mohammed Toha
sem:Event   rdf:type
                            The Attack on     sem:hasPlace             rdf:type
                                                                                                         Paints "Three Fighter          rdf:type   sem:Event
                             Yogyakarta                                                                   Aircraft in the Sky"
                                                                     sem:Place
Narratives                                                 1945 - 1946
Armed                                               sem:hasTimeStamp
Conflict
                   sem:
                 eventType
                                 The Attack on
                                  Yogyakarta
                                                       sem:hasPlace
                                                                            Indonesia


                       sem:hasActor

          KNIL


                                           agora:hasBiographicalRelation




                                                                        19/12/1948 - 31/12/1948
Armed                                               sem:hasTimeStamp
Conflict
                   sem:
                 eventType
                                      Operation
                                        Crow
                                                       sem:hasPlace          Sumatra


                       sem:hasActor


          KNIL


                                            agora:hasBiographicalRelation




                                                                           01/03/1949
                                                    sem:hasTimeStamp
Attack
                   sem:
                 eventType
                                 The Attack on
                                  Yogyakarta
                                                       sem:hasPlace
                                                                            Yogyakarta

                       sem:hasActor

          KNIL
Event-driven Browsing
Event-driven Browsing
Event-driven Browsing
Building an Event Thesaurus

•   There are no extensive structured
    event descriptions
•   Rijksmuseum Amsterdam has a
    flat list of 1,693 ‘events’: only
    names and very much focused on
    17th century Holland
•   Our goal:
       • create a list of historically
           relevant events
       • provide actors, locations,
           times & types for each event
                                          Image src: http://www.collinsdictionary.com/static/graphics/default.png
First Attempt
•   Pattern based event-name
    extraction
       • In Dutch Wikipedia we
         found 2,444 event
         candidates
       • 1209 (56.3%) correct
       • 169 (13.9%) partially
         correct
•   Off-the-shelf named entity
    recognition (P/R/F1)
       • Person 77/77/77
       • Location 75/58/66
       • Organisation 32/37/34
                                 Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                 %205.jpg
First Attempt
• Co-occurrence based event-
  relation finder
     • only actor, location and/
        or date found for 392
        events
     • 49.6% actor is correct
     • 41.1% location is correct
     • 51.5% date is correct



                                   Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                   %205.jpg
First Attempt
• Problems event element recognition:
     • Shallow grammatical
         processing (post-war rebuilding
         and during the North sea flood
         recognised as 1 event)
     •   Missing locations (Battle of
         LOC pattern fails)
     •   No distinction between
         entities and action nouns
         (German Occupancy vs German
         Occupants look the same for
         the approach)
     •   Named Entity Recogniser not
         suited for domain
                                           Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                           %205.jpg
First Attempt
• Problems event relation
  finder:
    • Relies on redundancy in
      the data, only works for
      ‘popular’ events
    • Too coarse-grained (who
      were the actors/locations
      in WWII)
    • Evaluation is hard!

                                  Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                  %205.jpg
Back to the drawing board...
• Analysis of event names
     • Combinations of sortal nouns with
          a PP and a named entity e.g., Battle
          of Stalingrad, Death of John Lennon
      •   Combinations of nominalized verbs
          with a PP and a named entity e.g,
          Excavation of Troy, Election of
          Obama.
      •   Combinations of a referential
          adjective with an event type and
          named entity e.g., the American
          invasion of Iraq.
      •   Transparent proper names: Great
          War
      •   Opaque proper names: Event
          names that can not be decomposed
          on morphological grounds e.g.,
          Holocaust, Spanish Fury
                                                 Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                                 molinotrashfire10.jpg
Back to the drawing board...
• Improve Named Entity
  Recognition
    • Add gazetteers for
      historical names
    • Post-processing for titles
      and improved NE
      boundaries




                                   Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                   molinotrashfire10.jpg
Back to the drawing board...
• Finding Event Relations
     • Use structure Wikipedia/
        DBpedia
    •   Shallow parsing
    •   Hierarchies of actors &
        locations




                                  Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                  molinotrashfire10.jpg
Current Work
               Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1)
  Person        54.05/7.52/13.20    58.60/34.46/43.40   79.17/71.16/74.95

 Location       64.52/30.77/41.67   67.19/66.15/66.67   80.00/61.54/69.57

Organisation          0/0/0         9.78/25.71/14.17    89.66/74.29/81.25



      • Still some work to be done, but
      Freire et al. (2012) shows that smart
      features can work with small amounts
      of training data
      • Combine classifiers
      • Add post-processing
      • MISC Class remains to be done...
Current Work
                                      Word                      POS       CHUNK       NER
                                      U.N.                      NNP       I-NP        I-ORG
                                      official                  NN        I-NP        O
                                      Ekeus                     NNP       I-NP        I-PER
                                      heads                     VBZ       I-VP        O
                                      for                       IN        I-PP        O
                                      Baghdad                   NNP       I-NP        I-LOC
                                      .                         .         O           O     [CoNLL2003]
focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class
"is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O"
"painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O"
"dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O"
"grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O"
".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O"
"William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER"
"Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER"
"made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O"
"many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O"
"telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O"


                                                                                                   [Freire et al. 2012]
Current Work

• Build smarter extractors for
  event names
    • First focus on ‘regular’
       event names (e.g., Battle
       of LOC, War of YEAR)
    • Use knowledge about
       action nouns vs static
       nouns (WordNet)
The Story So Far

• It takes time to learn to
    communicate in an
    interdisciplinary project
•   Don’t try to solve too much
    in one go
•   Cycles of error analysis
•   Domain adaptation is difficult:
    optimise for precision
Outlook

• Redesign of Agora demo (new
    version autumn/winter)
•   Include different perspectives
    (together with Semantics of
    History)
•   Ship model use case
•   Historical Named Entity
    Recognition for English & Dutch
•   2nd round user studies (spring
    2013)
¿
                                                   ?                                                                           ?

                                                             ¿

                               Questions?

                                                ?
marieke@cs.vu.nl
                                                                                                                        ¿
http://www.cs.vu.nl/~marieke        Image src: http://www.rijksmuseum.nl/collectie/SK-A-2963/portret-van-don-ram%C3%B3n-satu
                                    Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg?
                                    %C3%A9-1765-1824
                                               __SQUARESPACE_CACHEVERSION=1295297003883

Mais conteúdo relacionado

Mais de Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 

Mais de Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 

Último

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Último (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Agora: putting museum objects into their art-historic context

  • 1. Agora: putting museum objects into their art-historic context Marieke van Erp marieke@cs.vu.nl EURECOM July 2012
  • 2. Introduction • BA, MA & PhD Computational Linguistics/ Information Extraction @Tilburg University • Since 2009: SemWeb group @VU University Amsterdam
  • 3. Overview • The Agora Project • Digital Hermeneutics • Building an Event Thesaurus for Dutch • Experiments & Results • Outlook Image src: http://www.artrage.com.au/dreamgirl/filesend/223/ EarthFromAbove_EXPOTVDC212_prog.jpg
  • 4. The Agora Project • Collaboration VU CS & History departments, Netherlands Institute for Sound and Vision and Rijksmuseum Amsterdam • Facilitate and investigate digitally mediated public history
  • 5. Digitising Heritage • Galleries, libraries, archives and museums (GLAMS) are digitising their data and presenting it online • This changes the role of GLAMS from information interpreters to information providers • In the online setting, objects can easily start to lead their own lives Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
  • 6.
  • 7. Digital Hermeneutics • An object on its own has no meaning; event descriptions provide historical context • A single event only gives part of the historical context; chains of events (narratives) provide a more complete overview Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/ IoPReKrojkY/s1600/42st.jpg
  • 8. Event Dimension 19/12/1948 rma:creationDate sem:hasBeginTimeStamp sem:hasBeginTimeStamp sem:Actor sem:Actor rdf:type rdf:type Netherlands rma:maker Mohammed Toha Painting: Three Fighter Aircraft in the Sky sem: sem: rma:creationPlace hasActor hasActor agora:depictsEvent agora:createsEvent Yogyakarta sem:hasPlace Mohammed Toha sem:Event rdf:type The Attack on sem:hasPlace rdf:type Paints "Three Fighter rdf:type sem:Event Yogyakarta Aircraft in the Sky" sem:Place
  • 9. Narratives 1945 - 1946 Armed sem:hasTimeStamp Conflict sem: eventType The Attack on Yogyakarta sem:hasPlace Indonesia sem:hasActor KNIL agora:hasBiographicalRelation 19/12/1948 - 31/12/1948 Armed sem:hasTimeStamp Conflict sem: eventType Operation Crow sem:hasPlace Sumatra sem:hasActor KNIL agora:hasBiographicalRelation 01/03/1949 sem:hasTimeStamp Attack sem: eventType The Attack on Yogyakarta sem:hasPlace Yogyakarta sem:hasActor KNIL
  • 13. Building an Event Thesaurus • There are no extensive structured event descriptions • Rijksmuseum Amsterdam has a flat list of 1,693 ‘events’: only names and very much focused on 17th century Holland • Our goal: • create a list of historically relevant events • provide actors, locations, times & types for each event Image src: http://www.collinsdictionary.com/static/graphics/default.png
  • 14. First Attempt • Pattern based event-name extraction • In Dutch Wikipedia we found 2,444 event candidates • 1209 (56.3%) correct • 169 (13.9%) partially correct • Off-the-shelf named entity recognition (P/R/F1) • Person 77/77/77 • Location 75/58/66 • Organisation 32/37/34 Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 15. First Attempt • Co-occurrence based event- relation finder • only actor, location and/ or date found for 392 events • 49.6% actor is correct • 41.1% location is correct • 51.5% date is correct Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 16. First Attempt • Problems event element recognition: • Shallow grammatical processing (post-war rebuilding and during the North sea flood recognised as 1 event) • Missing locations (Battle of LOC pattern fails) • No distinction between entities and action nouns (German Occupancy vs German Occupants look the same for the approach) • Named Entity Recogniser not suited for domain Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 17. First Attempt • Problems event relation finder: • Relies on redundancy in the data, only works for ‘popular’ events • Too coarse-grained (who were the actors/locations in WWII) • Evaluation is hard! Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 18. Back to the drawing board... • Analysis of event names • Combinations of sortal nouns with a PP and a named entity e.g., Battle of Stalingrad, Death of John Lennon • Combinations of nominalized verbs with a PP and a named entity e.g, Excavation of Troy, Election of Obama. • Combinations of a referential adjective with an event type and named entity e.g., the American invasion of Iraq. • Transparent proper names: Great War • Opaque proper names: Event names that can not be decomposed on morphological grounds e.g., Holocaust, Spanish Fury Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 19. Back to the drawing board... • Improve Named Entity Recognition • Add gazetteers for historical names • Post-processing for titles and improved NE boundaries Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 20. Back to the drawing board... • Finding Event Relations • Use structure Wikipedia/ DBpedia • Shallow parsing • Hierarchies of actors & locations Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 21. Current Work Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1) Person 54.05/7.52/13.20 58.60/34.46/43.40 79.17/71.16/74.95 Location 64.52/30.77/41.67 67.19/66.15/66.67 80.00/61.54/69.57 Organisation 0/0/0 9.78/25.71/14.17 89.66/74.29/81.25 • Still some work to be done, but Freire et al. (2012) shows that smart features can work with small amounts of training data • Combine classifiers • Add post-processing • MISC Class remains to be done...
  • 22. Current Work Word POS CHUNK NER U.N. NNP I-NP I-ORG official NN I-NP O Ekeus NNP I-NP I-PER heads VBZ I-VP O for IN I-PP O Baghdad NNP I-NP I-LOC . . O O [CoNLL2003] focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class "is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O" "painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O" "dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O" "grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O" ".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O" "William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER" "Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER" "made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O" "many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O" "telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O" [Freire et al. 2012]
  • 23. Current Work • Build smarter extractors for event names • First focus on ‘regular’ event names (e.g., Battle of LOC, War of YEAR) • Use knowledge about action nouns vs static nouns (WordNet)
  • 24. The Story So Far • It takes time to learn to communicate in an interdisciplinary project • Don’t try to solve too much in one go • Cycles of error analysis • Domain adaptation is difficult: optimise for precision
  • 25. Outlook • Redesign of Agora demo (new version autumn/winter) • Include different perspectives (together with Semantics of History) • Ship model use case • Historical Named Entity Recognition for English & Dutch • 2nd round user studies (spring 2013)
  • 26. ¿ ? ? ¿ Questions? ? marieke@cs.vu.nl ¿ http://www.cs.vu.nl/~marieke Image src: http://www.rijksmuseum.nl/collectie/SK-A-2963/portret-van-don-ram%C3%B3n-satu Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg? %C3%A9-1765-1824 __SQUARESPACE_CACHEVERSION=1295297003883