SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
IRSAW – Towards Semantic
 Annotation of Documents for Question
               Answering

                       Johannes Leveling


              CNI Spring 2007 Task Force Meeting,
                Apr. 16–17, 2007, Phoenix, AZ

IRSAW: Intelligent Information Retrieval on the Basis of a
Semantically Annotated Web
funded by the DFG (Deutsche Forschungsgemeinschaft)
Introduction
                            IRSAW
                            Results



Outline

  1    Introduction

  2    IRSAW
         Basic Architecture
         The MultiNet paradigm
         Processing Phases
         Modules

  3    Results



  Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   2 / 30
Introduction
                                   IRSAW
                                   Results



Motivation

 More information becomes available on the internet
 and/but precise answers are difficult to find
  → Develop a semantically based question answering
     (QA) framework in the IRSAW project
     General idea:
                 Retrieve documents from the internet
                 Semantically analyze and annotate the question and
                 documents, and
                 Apply deep linguistic methods for question answering
                 on document content


  Johannes Leveling          IRSAW – Towards Semantic Annotation of Documents for QA   3 / 30
Introduction
                             IRSAW
                             Results



Background and Embedding in a General
Strategy
        Semantically based natural language processing
        Knowledge representation MultiNet (concept
        oriented)
        Support by large semantically oriented computational
        lexicon → word sense disambiguation
        Homogeneous representation of lexical knowledge,
        general background knowledge (world knowledge),
        dialogue context, and meaning of sentences and
        texts
        Emphasis on semantic relations in all applications
        (not just concepts or descriptors)
 Johannes Leveling     IRSAW – Towards Semantic Annotation of Documents for QA   4 / 30
Introduction
                           IRSAW
                           Results



NLI-Z39.50: Beyond Descriptor Search
 Predecessor: Natural language interface for the Z39.50
 protocol
     Natural language interface to libraries and
     information providers on the internet
     Transformation of semantic structures into
     expressions of formal retrieval languages
     Includes features such as de-duplication of results,
     phonetic search, decomposition of compounds, query
     expansion with additional concepts
     Example query: Where do I find books by Peter
     Jackson which were published in the last ten years
     with Springer and Addison-Wesley?
 Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   5 / 30
Basic Architecture
                        Introduction
                                         The MultiNet paradigm
                             IRSAW
                                         Processing Phases
                             Results
                                         Modules


IRSAW



        Multi staged approach
        Questions are processed in three phases, accessing
        web search engines, local databases, and a semantic
        network knowledge base
        Web documents → semantic annotation




 Johannes Leveling     IRSAW – Towards Semantic Annotation of Documents for QA   7 / 30
Basic Architecture
                                     Introduction
                                                         The MultiNet paradigm
                                          IRSAW
                                                         Processing Phases
                                          Results
                                                         Modules


Architecture of IRSAW

   IRSAW framework
                                                                         Merge answer streams
                                                                         Validate answer candidates
                                                                         Score and rank answer candidates
                                               InSicht
                                                                         Select answer candidate
           Document preprocessing
                                               QAP                               MAVE
   Natural language question
                                                                                                   Answer
                                               MIRA


           Question processing         Find answer candidates




  Johannes Leveling                 IRSAW – Towards Semantic Annotation of Documents for QA                 8 / 30
Basic Architecture
                      Introduction
                                       The MultiNet paradigm
                           IRSAW
                                       Processing Phases
                           Results
                                       Modules


IRSAW – Methods and Modules (1/3)


    Apply parser (for German language) to produce a
    semantic network representation of texts (based on
    the knowledge representation paradigm MultiNet)
  → Allows a full semantic interpretation of questions and
    documents on which logical inferences are based
    (state-of-the-art: mostly shallow methods)




 Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   9 / 30
Basic Architecture
                      Introduction
                                       The MultiNet paradigm
                           IRSAW
                                       Processing Phases
                           Results
                                       Modules


IRSAW – Methods and Modules (2/3)


    Combine different data streams containing answer
    candidates
  → Use different methods to produce answer streams to
    increase recall and robustness
    Logically validate answers
  → Select validated answers from streams of answer
    candidates to increase precision




 Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   10 / 30
Basic Architecture
                      Introduction
                                       The MultiNet paradigm
                           IRSAW
                                       Processing Phases
                           Results
                                       Modules


IRSAW – Methods and Modules (3/3)


    Natural language generation of answers
  → Allows for rephrasing from text and combination of
    answer fragments from different documents
    (state-of-the-art: extracting snippets from the text)
    IRSAW also aims at investigating linguistic
    phenomena in questions and documents (e.g.
    idioms, metonymy, and temporal and spatial aspects)




 Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   11 / 30
Basic Architecture
                      Introduction
                                       The MultiNet paradigm
                           IRSAW
                                       Processing Phases
                           Results
                                       Modules


IRSAW – Software



 The IRSAW project will result in two software components
 accessible via internet:
   1 The question answering system IRSAW
   2 A web service for the semantic annotation of (web)
     documents




 Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   12 / 30
Basic Architecture
                                            Introduction
                                                                         The MultiNet paradigm
                                                 IRSAW
                                                                         Processing Phases
                                                 Results
                                                                         Modules


MultiNet: Example for Semantic Network
                In which year did Charles de Gaulle die?/
               In welchem Jahr starb Charles de Gaulle?
                                                   c31d   de Gaulle
                c6dn        starb                  SUB Mensch                            c33na
               SUBS sterben           AFF            fact          real
                                                                             ATTR/ SUBgener sp
                                                          
                                                                                       Nachname
                                               /    gener         sp
               TEMP past.0                          quant
                                                    refer
                                                                   one 
                                                                   det 
                                                                                       quant one
                 [gener sp]                         card          1                   card 1
                                                     etype         0                    etype 0
                                                     varia         con
                            TEMP
                                                                  ATTR                         VAL
      c5?wh-questionme∨oa∨ta        Jahr                       
                 SUB Jahr                            c32na
                fact        real                  SUB Vorname                              
                 gener sp                            gener         sp               De_Gaulle.0fe
                quant one                          quant         one
                refer det                          card          1
                 card 1                              etype         0
                 etype 0
                                                                   VAL
                                                           
                                                    Charles.0fe




  Johannes Leveling                        IRSAW – Towards Semantic Annotation of Documents for QA    13 / 30
Basic Architecture
                        Introduction
                                         The MultiNet paradigm
                             IRSAW
                                         Processing Phases
                             Results
                                         Modules


MultiNet: Tools and Resources


         Syntactic-semantic parser:
         WOCADI (Word Class Controlled Disambiguating
         Parser)
         Large semantic computational lexicon:
         HaGenLex (Hagen German Lexicon)
         Workbench for the computer lexicographer:
         LiaPlus (Lexicon in action)




  Johannes Leveling    IRSAW – Towards Semantic Annotation of Documents for QA   14 / 30
Basic Architecture
                       Introduction
                                        The MultiNet paradigm
                            IRSAW
                                        Processing Phases
                            Results
                                        Modules


IRSAW: First Processing Phase


        Transform user question into IR query
        Preselect information resources (→ Broker)
        Send IR query to web search engines and web
        portals (→ lists of URLs)
        Retrieve web documents referenced and convert
        them




 Johannes Leveling    IRSAW – Towards Semantic Annotation of Documents for QA   15 / 30
Basic Architecture
                         Introduction
                                          The MultiNet paradigm
                              IRSAW
                                          Processing Phases
                              Results
                                          Modules


IRSAW: Second Processing Phase



        Segment and index text passages from the web in
        local database
        Access to units of textual information of certain types
        (chapters, paragraphs, sentences, or phrases)




 Johannes Leveling      IRSAW – Towards Semantic Annotation of Documents for QA   16 / 30
Basic Architecture
                             Introduction
                                              The MultiNet paradigm
                                  IRSAW
                                              Processing Phases
                                  Results
                                              Modules


IRSAW: Third Processing Phase


        Employ different modules to produce data streams
        containing answer candidates:
                QAP (Question Answering by Pattern matching),
                MIRA (Modified Information Retrieval Approach), and
                InSicht
        Merge, rank, logically validate answer candidates and
        select best answer (MAVE)




 Johannes Leveling          IRSAW – Towards Semantic Annotation of Documents for QA   17 / 30
Basic Architecture
                       Introduction
                                        The MultiNet paradigm
                            IRSAW
                                        Processing Phases
                            Results
                                        Modules


InSicht

     Analyze text segments (question, texts) with
     WOCADI and return the representation of the
     meaning of a text as a semantic network
     Expand queries with semantically related concepts
   → High recall
     Paraphrase answer node in semantic network
     (generate answer)
     Match semantic networks
   → High precision
   + Co-reference resolution, logical inference
     rules/textual entailments
  Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   18 / 30
Basic Architecture
                           Introduction
                                            The MultiNet paradigm
                                IRSAW
                                            Processing Phases
                                Results
                                            Modules


InSicht Logical Entailment

   ( (rule
                (
                (subs ?n1 “ermorden.1.1”) ;; kill
                (aff ?n1 ?n2)
                -
                (subs ?n3 “sterben.1.1”) ;; die
                (aff ?n3 ?n2)
                ))
         (ktype categ)
         (name “ermorden.1.1_entailment”) )


  Johannes Leveling       IRSAW – Towards Semantic Annotation of Documents for QA   19 / 30
Basic Architecture
                        Introduction
                                         The MultiNet paradigm
                             IRSAW
                                         Processing Phases
                             Results
                                         Modules


InSicht Example
        User question: In which year did Charles de Gaulle
        die?
        In welchem Jahr starb Charles de Gaulle?
        Text passage: France’s chief of state Jacques Chirac
        acknowledged the merits of general and statesman
        Charles de Gaulle, who died 25 years ago.
        Frankreichs Staatschef Jacques Chirac hat die
        Verdienste des vor 25 Jahren gestorbenen Generals
        und Staatsmannes Charles de Gaulle gewürdigt.
        (SDA.951109.0236)
        Answer: 1970 (deictic temporal expression resolved;
        document written in 1995)
 Johannes Leveling     IRSAW – Towards Semantic Annotation of Documents for QA   20 / 30
Basic Architecture
                      Introduction
                                       The MultiNet paradigm
                           IRSAW
                                       Processing Phases
                           Results
                                       Modules


QAP - Question Answering by Pattern
Matching

      Create patterns by processing known
      question-answer pairs
      Search for text passages containing keywords from
      question
      Apply pattern matching on answer candidates
      Extract answer string from variable binding
    + Robustness, high precision for a small class of
      questions
    – No logical inferences possible

 Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   21 / 30
Basic Architecture
                         Introduction
                                          The MultiNet paradigm
                              IRSAW
                                          Processing Phases
                              Results
                                          Modules


QAP Example

        User question: In which year was the Russian
        Revolution?
        In welchem Jahr fand die russische Revolution statt?
        Text passage: The satire inspired by the Russian
        revolution 1917 lets the dream of liberty and equality
        fail because of humans.
        Die von der Russischen Revolution 1917 inspirierte
        Satire läßt den Traum von Freiheit und Gleichheit an
        den Menschen scheitern. (FR940612-000533)
        Answer: 1917 (pattern matching subsystem ignores
        metonymy and ellipsis)

 Johannes Leveling      IRSAW – Towards Semantic Annotation of Documents for QA   22 / 30
Basic Architecture
                       Introduction
                                        The MultiNet paradigm
                            IRSAW
                                        Processing Phases
                            Results
                                        Modules


MIRA - Modified Information Retrieval
Approach

       Train tagger on answer classes (LOC, PER, ORG)
       Search for text passages containing keywords from
       question
       Use tagger on answer candidate sentence and select
       most frequent word sequence
     + Highly recall-oriented
     – Very low precision, works only for a small class of
       questions (with answer type LOC, PER, ORG)


  Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA   23 / 30
Basic Architecture
                        Introduction
                                         The MultiNet paradigm
                             IRSAW
                                         Processing Phases
                             Results
                                         Modules


MIRA Example

        User question: Who was the first man on the moon?
        Wer war der erste Mensch auf dem Mond?
        Text passage: Twenty-five years ago Neil Armstrong
        was the first man to step onto the moon, but today
        manned space flight stagnates.
        Vor 25 Jahren betrat Neil Armstrong als erster
        Mensch den Mond, doch heute stagniert die
        bemannte Raumfahrt. (FR940724-001243)
        Answer: Neil Armstrong (PER)


 Johannes Leveling     IRSAW – Towards Semantic Annotation of Documents for QA   24 / 30
Basic Architecture
                         Introduction
                                          The MultiNet paradigm
                              IRSAW
                                          Processing Phases
                              Results
                                          Modules


MAVE - MultiNet-based Answer Verification



        Validate answer candidates
        Test logical validity of answer candidate (using
        inferences, entailments)
        Added heuristic quality indicators as fallback strategy
        Select most trusted answer




 Johannes Leveling      IRSAW – Towards Semantic Annotation of Documents for QA   25 / 30
Introduction
                              IRSAW
                              Results



Evaluations
         InSicht evaluation: best performance for monolingual
         German question answering task at Cross Language
         Evaluation Forum 2005 (QA@CLEF 2005)
         IRSAW evaluation at QA@CLEF 2006: combination
         of InSicht and QAP answer stream:
         one of the best results in the monolingual German
         QA track;
         best results for answer validation task with MAVE
         IRSAW evaluation (for RIAO 2007): InSicht, QAP,
         MIRA answer streams, and logical validation with
         MAVE → better results with more answer streams
         and logical answer validation
  Johannes Leveling     IRSAW – Towards Semantic Annotation of Documents for QA   26 / 30
Introduction
                            IRSAW
                            Results



Evaluation Results

 Results for answer validation of answer candidates for
 600 questions (InSicht:I, MIRA:M, QAP:Q; c=correct,
 i=inexact, w=wrong)

         QA streams                                  c            i             w
         IRSAW: I               199.4 10.9                              15.7
         IRSAW: I+M+Q           244.4 16.9                             255.7
         IRSAW: I+M+Q (Optimum) 290.0 15.0                             215.0




  Johannes Leveling   IRSAW – Towards Semantic Annotation of Documents for QA       27 / 30
Introduction
                               IRSAW
                               Results



Project Status



         Implementation of modules for QA system IRSAW
         completed
         Evaluations are now based on corpus of newspaper
         articles / Wikipedia (corpora of millions of sentences)
         Semantic annotation ⇒ Semantic Web




  Johannes Leveling      IRSAW – Towards Semantic Annotation of Documents for QA   28 / 30
Introduction
                             IRSAW
                             Results



Future Work



        Next: Evaluation on web pages (even bigger corpus,
        dynamic content)
        Add more robustness
        Add acoustic interface (speech input)
        Create English prototype (?)




 Johannes Leveling     IRSAW – Towards Semantic Annotation of Documents for QA   29 / 30
Introduction
                                 IRSAW
                                 Results



Selected References

        Ingo Glöckner, Sven Hartrumpf, and Johannes Leveling. Logical
        validation, answer merging and witness selection – a case study
        in multi-stream question answering. To appear in: Proceedings
        of RIAO 2007, 2007.
        Sven Hartrumpf and Johannes Leveling. University of Hagen at
        QA@CLEF 2006: Interpretation and normalization of temporal
        expressions. In Alessandro Nardi, Carol Peters, and José Luis
        Vicedo, editors, Results of the CLEF 2006 Cross-Language
        System Evaluation Campaign, Working Notes for the CLEF
        2006 Workshop, Alicante, Spain, 2006.
        Hermann Helbig. Knowledge Representation and the
        Semantics of Natural Language. Springer, Berlin, 2006.


 Johannes Leveling        IRSAW – Towards Semantic Annotation of Documents for QA   30 / 30

Mais conteúdo relacionado

Semelhante a IRSAW - Towards Semantic Annotation for Question Answering

Atul_Mohan_Resume_LinkedIn
Atul_Mohan_Resume_LinkedInAtul_Mohan_Resume_LinkedIn
Atul_Mohan_Resume_LinkedInAtul Mohan
 
ASIH Fishnet2 Presentation
ASIH Fishnet2 PresentationASIH Fishnet2 Presentation
ASIH Fishnet2 PresentationDave Vieglais
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebNuxeo
 
Timeliner poster on CSCW 2012 conference
Timeliner poster on CSCW 2012 conferenceTimeliner poster on CSCW 2012 conference
Timeliner poster on CSCW 2012 conferenceVladimir Tomberg
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Carmichael.kevin
Carmichael.kevinCarmichael.kevin
Carmichael.kevinNASAPMC
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Nakul Sharma
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic
 
WSO2 Mashups and BPM
WSO2 Mashups and BPMWSO2 Mashups and BPM
WSO2 Mashups and BPMWSO2
 

Semelhante a IRSAW - Towards Semantic Annotation for Question Answering (20)

Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Atul_Mohan_Resume_LinkedIn
Atul_Mohan_Resume_LinkedInAtul_Mohan_Resume_LinkedIn
Atul_Mohan_Resume_LinkedIn
 
ASIH Fishnet2 Presentation
ASIH Fishnet2 PresentationASIH Fishnet2 Presentation
ASIH Fishnet2 Presentation
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
Nikhil_Ayyagari_Resume
Nikhil_Ayyagari_ResumeNikhil_Ayyagari_Resume
Nikhil_Ayyagari_Resume
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Duc le CV
Duc le CVDuc le CV
Duc le CV
 
TripathiAkriti_resume
TripathiAkriti_resumeTripathiAkriti_resume
TripathiAkriti_resume
 
Timeliner poster on CSCW 2012 conference
Timeliner poster on CSCW 2012 conferenceTimeliner poster on CSCW 2012 conference
Timeliner poster on CSCW 2012 conference
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Carmichael.kevin
Carmichael.kevinCarmichael.kevin
Carmichael.kevin
 
Unit 01 - Introduction
Unit 01 - IntroductionUnit 01 - Introduction
Unit 01 - Introduction
 
Resume
ResumeResume
Resume
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey
 
Resume
ResumeResume
Resume
 
Shaman Project Hemmje
Shaman Project  HemmjeShaman Project  Hemmje
Shaman Project Hemmje
 
Ashfakul_Resume
Ashfakul_ResumeAshfakul_Resume
Ashfakul_Resume
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation Defense
 
WSO2 Mashups and BPM
WSO2 Mashups and BPMWSO2 Mashups and BPM
WSO2 Mashups and BPM
 

Último

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Último (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

IRSAW - Towards Semantic Annotation for Question Answering

  • 1. IRSAW – Towards Semantic Annotation of Documents for Question Answering Johannes Leveling CNI Spring 2007 Task Force Meeting, Apr. 16–17, 2007, Phoenix, AZ IRSAW: Intelligent Information Retrieval on the Basis of a Semantically Annotated Web funded by the DFG (Deutsche Forschungsgemeinschaft)
  • 2. Introduction IRSAW Results Outline 1 Introduction 2 IRSAW Basic Architecture The MultiNet paradigm Processing Phases Modules 3 Results Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 2 / 30
  • 3. Introduction IRSAW Results Motivation More information becomes available on the internet and/but precise answers are difficult to find → Develop a semantically based question answering (QA) framework in the IRSAW project General idea: Retrieve documents from the internet Semantically analyze and annotate the question and documents, and Apply deep linguistic methods for question answering on document content Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 3 / 30
  • 4. Introduction IRSAW Results Background and Embedding in a General Strategy Semantically based natural language processing Knowledge representation MultiNet (concept oriented) Support by large semantically oriented computational lexicon → word sense disambiguation Homogeneous representation of lexical knowledge, general background knowledge (world knowledge), dialogue context, and meaning of sentences and texts Emphasis on semantic relations in all applications (not just concepts or descriptors) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 4 / 30
  • 5. Introduction IRSAW Results NLI-Z39.50: Beyond Descriptor Search Predecessor: Natural language interface for the Z39.50 protocol Natural language interface to libraries and information providers on the internet Transformation of semantic structures into expressions of formal retrieval languages Includes features such as de-duplication of results, phonetic search, decomposition of compounds, query expansion with additional concepts Example query: Where do I find books by Peter Jackson which were published in the last ten years with Springer and Addison-Wesley? Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 5 / 30
  • 6.
  • 7. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW Multi staged approach Questions are processed in three phases, accessing web search engines, local databases, and a semantic network knowledge base Web documents → semantic annotation Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 7 / 30
  • 8. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules Architecture of IRSAW IRSAW framework Merge answer streams Validate answer candidates Score and rank answer candidates InSicht Select answer candidate Document preprocessing QAP MAVE Natural language question Answer MIRA Question processing Find answer candidates Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 8 / 30
  • 9. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW – Methods and Modules (1/3) Apply parser (for German language) to produce a semantic network representation of texts (based on the knowledge representation paradigm MultiNet) → Allows a full semantic interpretation of questions and documents on which logical inferences are based (state-of-the-art: mostly shallow methods) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 9 / 30
  • 10. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW – Methods and Modules (2/3) Combine different data streams containing answer candidates → Use different methods to produce answer streams to increase recall and robustness Logically validate answers → Select validated answers from streams of answer candidates to increase precision Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 10 / 30
  • 11. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW – Methods and Modules (3/3) Natural language generation of answers → Allows for rephrasing from text and combination of answer fragments from different documents (state-of-the-art: extracting snippets from the text) IRSAW also aims at investigating linguistic phenomena in questions and documents (e.g. idioms, metonymy, and temporal and spatial aspects) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 11 / 30
  • 12. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW – Software The IRSAW project will result in two software components accessible via internet: 1 The question answering system IRSAW 2 A web service for the semantic annotation of (web) documents Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 12 / 30
  • 13. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules MultiNet: Example for Semantic Network In which year did Charles de Gaulle die?/ In welchem Jahr starb Charles de Gaulle? c31d de Gaulle c6dn starb SUB Mensch c33na SUBS sterben AFF fact real ATTR/ SUBgener sp   Nachname / gener sp TEMP past.0 quant refer one  det   quant one [gener sp] card 1  card 1 etype 0 etype 0 varia con TEMP ATTR VAL c5?wh-questionme∨oa∨ta Jahr SUB Jahr c32na fact real SUB Vorname gener sp gener sp De_Gaulle.0fe quant one  quant one refer det  card 1 card 1 etype 0 etype 0 VAL Charles.0fe Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 13 / 30
  • 14. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules MultiNet: Tools and Resources Syntactic-semantic parser: WOCADI (Word Class Controlled Disambiguating Parser) Large semantic computational lexicon: HaGenLex (Hagen German Lexicon) Workbench for the computer lexicographer: LiaPlus (Lexicon in action) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 14 / 30
  • 15. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW: First Processing Phase Transform user question into IR query Preselect information resources (→ Broker) Send IR query to web search engines and web portals (→ lists of URLs) Retrieve web documents referenced and convert them Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 15 / 30
  • 16. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW: Second Processing Phase Segment and index text passages from the web in local database Access to units of textual information of certain types (chapters, paragraphs, sentences, or phrases) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 16 / 30
  • 17. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules IRSAW: Third Processing Phase Employ different modules to produce data streams containing answer candidates: QAP (Question Answering by Pattern matching), MIRA (Modified Information Retrieval Approach), and InSicht Merge, rank, logically validate answer candidates and select best answer (MAVE) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 17 / 30
  • 18. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules InSicht Analyze text segments (question, texts) with WOCADI and return the representation of the meaning of a text as a semantic network Expand queries with semantically related concepts → High recall Paraphrase answer node in semantic network (generate answer) Match semantic networks → High precision + Co-reference resolution, logical inference rules/textual entailments Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 18 / 30
  • 19. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules InSicht Logical Entailment ( (rule ( (subs ?n1 “ermorden.1.1”) ;; kill (aff ?n1 ?n2) - (subs ?n3 “sterben.1.1”) ;; die (aff ?n3 ?n2) )) (ktype categ) (name “ermorden.1.1_entailment”) ) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 19 / 30
  • 20. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules InSicht Example User question: In which year did Charles de Gaulle die? In welchem Jahr starb Charles de Gaulle? Text passage: France’s chief of state Jacques Chirac acknowledged the merits of general and statesman Charles de Gaulle, who died 25 years ago. Frankreichs Staatschef Jacques Chirac hat die Verdienste des vor 25 Jahren gestorbenen Generals und Staatsmannes Charles de Gaulle gewürdigt. (SDA.951109.0236) Answer: 1970 (deictic temporal expression resolved; document written in 1995) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 20 / 30
  • 21. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules QAP - Question Answering by Pattern Matching Create patterns by processing known question-answer pairs Search for text passages containing keywords from question Apply pattern matching on answer candidates Extract answer string from variable binding + Robustness, high precision for a small class of questions – No logical inferences possible Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 21 / 30
  • 22. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules QAP Example User question: In which year was the Russian Revolution? In welchem Jahr fand die russische Revolution statt? Text passage: The satire inspired by the Russian revolution 1917 lets the dream of liberty and equality fail because of humans. Die von der Russischen Revolution 1917 inspirierte Satire läßt den Traum von Freiheit und Gleichheit an den Menschen scheitern. (FR940612-000533) Answer: 1917 (pattern matching subsystem ignores metonymy and ellipsis) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 22 / 30
  • 23. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules MIRA - Modified Information Retrieval Approach Train tagger on answer classes (LOC, PER, ORG) Search for text passages containing keywords from question Use tagger on answer candidate sentence and select most frequent word sequence + Highly recall-oriented – Very low precision, works only for a small class of questions (with answer type LOC, PER, ORG) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 23 / 30
  • 24. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules MIRA Example User question: Who was the first man on the moon? Wer war der erste Mensch auf dem Mond? Text passage: Twenty-five years ago Neil Armstrong was the first man to step onto the moon, but today manned space flight stagnates. Vor 25 Jahren betrat Neil Armstrong als erster Mensch den Mond, doch heute stagniert die bemannte Raumfahrt. (FR940724-001243) Answer: Neil Armstrong (PER) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 24 / 30
  • 25. Basic Architecture Introduction The MultiNet paradigm IRSAW Processing Phases Results Modules MAVE - MultiNet-based Answer Verification Validate answer candidates Test logical validity of answer candidate (using inferences, entailments) Added heuristic quality indicators as fallback strategy Select most trusted answer Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 25 / 30
  • 26. Introduction IRSAW Results Evaluations InSicht evaluation: best performance for monolingual German question answering task at Cross Language Evaluation Forum 2005 (QA@CLEF 2005) IRSAW evaluation at QA@CLEF 2006: combination of InSicht and QAP answer stream: one of the best results in the monolingual German QA track; best results for answer validation task with MAVE IRSAW evaluation (for RIAO 2007): InSicht, QAP, MIRA answer streams, and logical validation with MAVE → better results with more answer streams and logical answer validation Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 26 / 30
  • 27. Introduction IRSAW Results Evaluation Results Results for answer validation of answer candidates for 600 questions (InSicht:I, MIRA:M, QAP:Q; c=correct, i=inexact, w=wrong) QA streams c i w IRSAW: I 199.4 10.9 15.7 IRSAW: I+M+Q 244.4 16.9 255.7 IRSAW: I+M+Q (Optimum) 290.0 15.0 215.0 Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 27 / 30
  • 28. Introduction IRSAW Results Project Status Implementation of modules for QA system IRSAW completed Evaluations are now based on corpus of newspaper articles / Wikipedia (corpora of millions of sentences) Semantic annotation ⇒ Semantic Web Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 28 / 30
  • 29. Introduction IRSAW Results Future Work Next: Evaluation on web pages (even bigger corpus, dynamic content) Add more robustness Add acoustic interface (speech input) Create English prototype (?) Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 29 / 30
  • 30. Introduction IRSAW Results Selected References Ingo Glöckner, Sven Hartrumpf, and Johannes Leveling. Logical validation, answer merging and witness selection – a case study in multi-stream question answering. To appear in: Proceedings of RIAO 2007, 2007. Sven Hartrumpf and Johannes Leveling. University of Hagen at QA@CLEF 2006: Interpretation and normalization of temporal expressions. In Alessandro Nardi, Carol Peters, and José Luis Vicedo, editors, Results of the CLEF 2006 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2006 Workshop, Alicante, Spain, 2006. Hermann Helbig. Knowledge Representation and the Semantics of Natural Language. Springer, Berlin, 2006. Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 30 / 30