SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
TETI: a TimeML Compliant TimEx
         Tagger for Italian


     Tommaso Caselli, Felice dell'Orletta and Irina Prodanof

Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa
                  {firstName.secondName@ilc.cnr.it}




                           IMCSIT 2009 – CL-A09, Mragawo, October, 13
Outline:


    Motivations

    Extracting Temporal expression and the TIMEX3
    tag

    TETI:
        −   System architecture
        −   Demo

    Evaluation

    Conclusions & Future Work
Motivations

    Recovering temporal relations in text/discourse is essential to
    improve the performance of many NLP systems (O.D-Q.A., Text
    Mining, Summarization, Reasoning)

     Most temporal information in text/discourse is only IMPLICITLY
    stated

    Need to develop procedures to maximize the role of the various
    sources of information

    Temporal expressions represent a source of explicit temporal
    knowledge which can:
           −   Locate an eventuality in time, and thus used for
               inferencing for temporal relations between eventualities
         −    Measure the duration of an eventuality
Extracting Temporal Expressions


    The extraction of timexes can be divide into 4
    subtasks:
        −   Recognizing and bracketing the timex
        −   Feature extraction (type of time unit, referential
            status, presence of modifiers)
        −   Computing the interval of reference on the time
            line
        −   Resolving the timex, i.e. normalize the value to a
            standard output format
Extracting Temporal Expressions


    The extraction of timexes can be divide into 4
    subtasks:
        −   Recognizing and bracketing the timex
        −   Feature extraction (type of time unit,
            referential status, presence of modifiers)
        −   Computing the interval of reference on the time
            line
        −   Resolving the timex, i.e. normalize the value to a
            standard output format
Temporal Expressions in TimeML:
    The TIMEX3 tag

    TIMEX3 tag extends and improves previous tags for this task,
    namely TIMEX, TIDES TIMEX2

    TIMEX3 tag is used to mark any time word i.e. both absolute
    and relative timexes such as day time (midnight..), dates of
    different granularity (yesterday, last spring..), calendar dates
    (01/12/1980..), durations (three hours, two years..), set of time
    (yearly, every day..)

    The annotation process is based on:
         −   the constituent structure (NP, AdjP, AdvP, Time/Date
             Pattern)
         −   the granularity of the time units
         −   the relations between the timexes
TETI: Temporal Expression Tagger
for Italian
                       
                           Rule-based system
                       
                           Main components:
        Chunked text
                           TIMEX
                           DETECTOR &
                           TIMEX TAGGER
                       
                           Two external
                           resources: TimEx
                           Trigger Dictionary
                           and a Modifier
                           Dictionary
TETI: Temporal Expression Tagger
for Italian (2)


               Chunked text
TETI: Temporal Expression Tagger
for Italian (2)
TETI: Temporal Expression Tagger
for Italian (2)
                   
                       Chunker output
                       approximate
                       TIMEX3 tag
                       extent
                   
                       Extent of timexes
                       corresponds to
                       regolar patterns of
                       combination of
                       chunks
TETI: Temporal Expression Tagger
for Italian (3)
                       
                           Analysis of the
                           chuncked text

        Chunked text
                       
                           Lookout in the
                           TimeEx Trigger
                           dictionary
                       
                           Extraction of the
                           necessary features
                           for the bracketing
TETI: Temporal Expression Tagger
for Italian (3)
TETI: Temporal Expression Tagger
for Italian (4)
                       
                           Core element of
                           the tagger
        Chunked text
                       
                           A general
                           condition + set of
                           local conditions
                       
                           If the conditions
                           are true, the tagger
                           activates the
                           related rules and
                           brackets the timex
                           with TIMEX3 
TETI: Temporal Expression Tagger
for Italian (4)
COND
(and
(or (POTGOV_CHUNK equals N_C)
(POTGOV_CHUNK equals ADV_C)
(POTGOV_CHUNK equals ADJ_C))
(not (POTGOV_CHUNK has PREMODIF))
(not (POTGOV_lemma CHUNK-1 equals modiftrigger))
(or (not(POTGOV_lemma CHUNK+1 equals lextrigger))
(not (POTGOV_lemma CHUNK+1 equals modiftrigger)))
)
then
   CREATE TIMEX3_tag
   (and(BEGIN_AT B_CHUNK)
   (END_AT E_CHUNK))
TETI: Temporal Expression Tagger
for Italian (4)
TETI: Temporal Expression Tagger
for Italian (4)
TETI: Temporal Expression Tagger
for Italian (5)
                       
                           More complex
                           timexes require a
        Chunked text       further lookup in
                           the TimEx Trigger
                           Dictionary to
                           extract further
                           features (sematic
                           relations) for the
                           correct bracketing
TETI: Temporal Expression Tagger
for Italian (5)
Evaluation
 
     42 newpaper articles manually annotated
 
     367 timexes

     TAG       TOT   CORR.   MISSING   INCORR.    P       R       F

TIMEX3         367    321      35        66      82.95   90.17   86.41


TIMEX3:        90     55       12        23      82.09   70.51   75.86
modificatori
Conclusion & Future Work

•   Reduction of the number of false positives
•   Implemetation of the normalization phase → rule
    based
•   Re-wrting of the rules to be compliant with the
    KAF format (KYOTO Project)
•   Release of the tool via web service
Acknowlegments

  Thanks to Roberto Bartolini for his help in the
             development of the demo
Thank You!
Complex Rule 1
    COND
        (and
        (not (POTGOV_lemma CHUNK-1 equals modiftrigger))
        ((POTGOV_lemma CHUNK+1 equals lextrigger)
                    then
                  (GET GRAN
                   GET DEFAULT TYPE))
         (COND
         ((PREMODIF_POTGOV_CHUNK equals modiftrigger)
                then
                (GET INFO_NORMALIZATION
                GET TIMEML_MOD_ATTRIBUTE
                GET TIMEML_BEGINPOINT_ATTRIBUTE
                GET TIMEML_ENDPOINT_ATTRIBUTE
                GET TR_RESPECT_TO ANCHOR))
                T)
        (or (POTGOV_CHUNK+1 equals N_C)
            (POTGOV_CHUNK+1 equals ADV_C)
            (POTGOV_lemma CHUNK+1 equals DATE PATTERN))
       (not (POTGOV_CHUNK+1 has PREMODIF))
       (POTGOV_CHUNK equals N_C)
Complex Rule 1b
   (COND
       1((and (equals (SEM_RELATION POTGOV_CHUNK)
                (has_as_part (LEXTRIG_CIBLE POTGOV_CHUNK+1))


          (equals (DEFAULT_TYPE POTGOV_CHUNK)DATE))
          (or (equals (DEFAULT_TYPE POTGOV_CHUNK+1) DATE))
             (equals (DEFAULT TYPE POTGOV_CHUNK+1) TIME)))

             then
              CREATE TIMEX3
             (and (BEGIN_AT B_POTGOV_CHUNK)
                 (END_AT E_POTGOV_CHUNK+1)))
           2 (( and (CREATE TIMEX3
             (and (BEGIN_AT B_POTGOV_CHUNK)
                  (END_AT E_POTGOV_CHUNK))
             (and (BEGIN_AT B_POTGOV_CHUNK+1)
                   (END_AT E_POTGOV_CHUNK+1))
            )))

Mais conteúdo relacionado

Mais procurados

Lecture%2038%20 %20 the%20class%20p
Lecture%2038%20 %20 the%20class%20pLecture%2038%20 %20 the%20class%20p
Lecture%2038%20 %20 the%20class%20posam0
 
Inversion Theorem for Generalized Fractional Hilbert Transform
Inversion Theorem for Generalized Fractional Hilbert TransformInversion Theorem for Generalized Fractional Hilbert Transform
Inversion Theorem for Generalized Fractional Hilbert Transforminventionjournals
 
Optimal control of multi delay systems via orthogonal functions
Optimal control of multi delay systems via orthogonal functionsOptimal control of multi delay systems via orthogonal functions
Optimal control of multi delay systems via orthogonal functionsiaemedu
 
The price density function, a tool for measuring investment risk,volatility a...
The price density function, a tool for measuring investment risk,volatility a...The price density function, a tool for measuring investment risk,volatility a...
The price density function, a tool for measuring investment risk,volatility a...Tinashe Mangoro
 
Comparison of-techniques-for
Comparison of-techniques-forComparison of-techniques-for
Comparison of-techniques-forEric M
 
Introduction on Prolog - Programming in Logic
Introduction on Prolog - Programming in LogicIntroduction on Prolog - Programming in Logic
Introduction on Prolog - Programming in LogicVishal Tandel
 

Mais procurados (12)

Ch2
Ch2Ch2
Ch2
 
Lecture%2038%20 %20 the%20class%20p
Lecture%2038%20 %20 the%20class%20pLecture%2038%20 %20 the%20class%20p
Lecture%2038%20 %20 the%20class%20p
 
Inversion Theorem for Generalized Fractional Hilbert Transform
Inversion Theorem for Generalized Fractional Hilbert TransformInversion Theorem for Generalized Fractional Hilbert Transform
Inversion Theorem for Generalized Fractional Hilbert Transform
 
Optimal control of multi delay systems via orthogonal functions
Optimal control of multi delay systems via orthogonal functionsOptimal control of multi delay systems via orthogonal functions
Optimal control of multi delay systems via orthogonal functions
 
Run time
Run timeRun time
Run time
 
The price density function, a tool for measuring investment risk,volatility a...
The price density function, a tool for measuring investment risk,volatility a...The price density function, a tool for measuring investment risk,volatility a...
The price density function, a tool for measuring investment risk,volatility a...
 
AA ppt9107
AA ppt9107AA ppt9107
AA ppt9107
 
Comparison of-techniques-for
Comparison of-techniques-forComparison of-techniques-for
Comparison of-techniques-for
 
Python for loop
Python for loopPython for loop
Python for loop
 
Introduction on Prolog - Programming in Logic
Introduction on Prolog - Programming in LogicIntroduction on Prolog - Programming in Logic
Introduction on Prolog - Programming in Logic
 
Return Oriented Programming
Return Oriented ProgrammingReturn Oriented Programming
Return Oriented Programming
 
C# p5
C# p5C# p5
C# p5
 

Semelhante a TETI: a TimeML Compliant TimEx Tagger for Italian

JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component Marialaura Bancheri
 
Añotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for SpanishAñotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for SpanishMaría Navas Loro
 
GEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward componentGEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward componentMarialaura Bancheri
 
A Bitemporal SQL Extension
A Bitemporal SQL ExtensionA Bitemporal SQL Extension
A Bitemporal SQL ExtensionIDES Editor
 
project introduction
project introductionproject introduction
project introductionstinmon
 
Ttcn ingenierie protocoles-poly4
Ttcn ingenierie protocoles-poly4Ttcn ingenierie protocoles-poly4
Ttcn ingenierie protocoles-poly4hemanth kumar sonti
 
Compiler in System Programming/Code Optimization techniques in System Program...
Compiler in System Programming/Code Optimization techniques in System Program...Compiler in System Programming/Code Optimization techniques in System Program...
Compiler in System Programming/Code Optimization techniques in System Program...Janki Shah
 
Anil timeline construction
Anil timeline constructionAnil timeline construction
Anil timeline constructionanilcs0405
 
Chronological Decomposition Heuristic: A Temporal Divide-and-Conquer Strateg...
Chronological Decomposition Heuristic:  A Temporal Divide-and-Conquer Strateg...Chronological Decomposition Heuristic:  A Temporal Divide-and-Conquer Strateg...
Chronological Decomposition Heuristic: A Temporal Divide-and-Conquer Strateg...Alkis Vazacopoulos
 
1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdf1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdfSemsemSameer1
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesGera Shegalov
 
Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02Hrudya Balachandran
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2Oswald Campesato
 
Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...bekidea
 

Semelhante a TETI: a TimeML Compliant TimEx Tagger for Italian (20)

JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component
 
Añotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for SpanishAñotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for Spanish
 
Yahoo search-study
Yahoo search-studyYahoo search-study
Yahoo search-study
 
GEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward componentGEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward component
 
A Bitemporal SQL Extension
A Bitemporal SQL ExtensionA Bitemporal SQL Extension
A Bitemporal SQL Extension
 
project introduction
project introductionproject introduction
project introduction
 
Ttcn ingenierie protocoles-poly4
Ttcn ingenierie protocoles-poly4Ttcn ingenierie protocoles-poly4
Ttcn ingenierie protocoles-poly4
 
Temporal Data
Temporal DataTemporal Data
Temporal Data
 
Compiler in System Programming/Code Optimization techniques in System Program...
Compiler in System Programming/Code Optimization techniques in System Program...Compiler in System Programming/Code Optimization techniques in System Program...
Compiler in System Programming/Code Optimization techniques in System Program...
 
Anil timeline construction
Anil timeline constructionAnil timeline construction
Anil timeline construction
 
Chronological Decomposition Heuristic: A Temporal Divide-and-Conquer Strateg...
Chronological Decomposition Heuristic:  A Temporal Divide-and-Conquer Strateg...Chronological Decomposition Heuristic:  A Temporal Divide-and-Conquer Strateg...
Chronological Decomposition Heuristic: A Temporal Divide-and-Conquer Strateg...
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdf1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdf
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
 
Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
Icsm07.ppt
 
IJCTT-V4I9P137
IJCTT-V4I9P137IJCTT-V4I9P137
IJCTT-V4I9P137
 
Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...
 
Tweet Cloud
Tweet CloudTweet Cloud
Tweet Cloud
 

TETI: a TimeML Compliant TimEx Tagger for Italian

  • 1. TETI: a TimeML Compliant TimEx Tagger for Italian Tommaso Caselli, Felice dell'Orletta and Irina Prodanof Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa {firstName.secondName@ilc.cnr.it} IMCSIT 2009 – CL-A09, Mragawo, October, 13
  • 2. Outline:  Motivations  Extracting Temporal expression and the TIMEX3 tag  TETI: − System architecture − Demo  Evaluation  Conclusions & Future Work
  • 3. Motivations  Recovering temporal relations in text/discourse is essential to improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning)  Most temporal information in text/discourse is only IMPLICITLY stated  Need to develop procedures to maximize the role of the various sources of information  Temporal expressions represent a source of explicit temporal knowledge which can: − Locate an eventuality in time, and thus used for inferencing for temporal relations between eventualities − Measure the duration of an eventuality
  • 4. Extracting Temporal Expressions  The extraction of timexes can be divide into 4 subtasks: − Recognizing and bracketing the timex − Feature extraction (type of time unit, referential status, presence of modifiers) − Computing the interval of reference on the time line − Resolving the timex, i.e. normalize the value to a standard output format
  • 5. Extracting Temporal Expressions  The extraction of timexes can be divide into 4 subtasks: − Recognizing and bracketing the timex − Feature extraction (type of time unit, referential status, presence of modifiers) − Computing the interval of reference on the time line − Resolving the timex, i.e. normalize the value to a standard output format
  • 6. Temporal Expressions in TimeML: The TIMEX3 tag  TIMEX3 tag extends and improves previous tags for this task, namely TIMEX, TIDES TIMEX2  TIMEX3 tag is used to mark any time word i.e. both absolute and relative timexes such as day time (midnight..), dates of different granularity (yesterday, last spring..), calendar dates (01/12/1980..), durations (three hours, two years..), set of time (yearly, every day..)  The annotation process is based on: − the constituent structure (NP, AdjP, AdvP, Time/Date Pattern) − the granularity of the time units − the relations between the timexes
  • 7. TETI: Temporal Expression Tagger for Italian  Rule-based system  Main components: Chunked text TIMEX DETECTOR & TIMEX TAGGER  Two external resources: TimEx Trigger Dictionary and a Modifier Dictionary
  • 8. TETI: Temporal Expression Tagger for Italian (2) Chunked text
  • 9. TETI: Temporal Expression Tagger for Italian (2)
  • 10. TETI: Temporal Expression Tagger for Italian (2)  Chunker output approximate TIMEX3 tag extent  Extent of timexes corresponds to regolar patterns of combination of chunks
  • 11. TETI: Temporal Expression Tagger for Italian (3)  Analysis of the chuncked text Chunked text  Lookout in the TimeEx Trigger dictionary  Extraction of the necessary features for the bracketing
  • 12. TETI: Temporal Expression Tagger for Italian (3)
  • 13. TETI: Temporal Expression Tagger for Italian (4)  Core element of the tagger Chunked text  A general condition + set of local conditions  If the conditions are true, the tagger activates the related rules and brackets the timex with TIMEX3 
  • 14. TETI: Temporal Expression Tagger for Italian (4) COND (and (or (POTGOV_CHUNK equals N_C) (POTGOV_CHUNK equals ADV_C) (POTGOV_CHUNK equals ADJ_C)) (not (POTGOV_CHUNK has PREMODIF)) (not (POTGOV_lemma CHUNK-1 equals modiftrigger)) (or (not(POTGOV_lemma CHUNK+1 equals lextrigger)) (not (POTGOV_lemma CHUNK+1 equals modiftrigger))) ) then CREATE TIMEX3_tag (and(BEGIN_AT B_CHUNK) (END_AT E_CHUNK))
  • 15. TETI: Temporal Expression Tagger for Italian (4)
  • 16. TETI: Temporal Expression Tagger for Italian (4)
  • 17. TETI: Temporal Expression Tagger for Italian (5)  More complex timexes require a Chunked text further lookup in the TimEx Trigger Dictionary to extract further features (sematic relations) for the correct bracketing
  • 18. TETI: Temporal Expression Tagger for Italian (5)
  • 19. Evaluation  42 newpaper articles manually annotated  367 timexes TAG TOT CORR. MISSING INCORR. P R F TIMEX3 367 321 35 66 82.95 90.17 86.41 TIMEX3: 90 55 12 23 82.09 70.51 75.86 modificatori
  • 20. Conclusion & Future Work • Reduction of the number of false positives • Implemetation of the normalization phase → rule based • Re-wrting of the rules to be compliant with the KAF format (KYOTO Project) • Release of the tool via web service
  • 21. Acknowlegments Thanks to Roberto Bartolini for his help in the development of the demo
  • 23. Complex Rule 1 COND (and (not (POTGOV_lemma CHUNK-1 equals modiftrigger)) ((POTGOV_lemma CHUNK+1 equals lextrigger) then (GET GRAN GET DEFAULT TYPE)) (COND ((PREMODIF_POTGOV_CHUNK equals modiftrigger) then (GET INFO_NORMALIZATION GET TIMEML_MOD_ATTRIBUTE GET TIMEML_BEGINPOINT_ATTRIBUTE GET TIMEML_ENDPOINT_ATTRIBUTE GET TR_RESPECT_TO ANCHOR)) T) (or (POTGOV_CHUNK+1 equals N_C) (POTGOV_CHUNK+1 equals ADV_C) (POTGOV_lemma CHUNK+1 equals DATE PATTERN)) (not (POTGOV_CHUNK+1 has PREMODIF)) (POTGOV_CHUNK equals N_C)
  • 24. Complex Rule 1b (COND 1((and (equals (SEM_RELATION POTGOV_CHUNK) (has_as_part (LEXTRIG_CIBLE POTGOV_CHUNK+1)) (equals (DEFAULT_TYPE POTGOV_CHUNK)DATE)) (or (equals (DEFAULT_TYPE POTGOV_CHUNK+1) DATE)) (equals (DEFAULT TYPE POTGOV_CHUNK+1) TIME))) then CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK+1))) 2 (( and (CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK)) (and (BEGIN_AT B_POTGOV_CHUNK+1) (END_AT E_POTGOV_CHUNK+1)) )))