Recognising and Interpreting Named Temporal Expressions

Recognising and Interpreting
Named Temporal Expressions
Matteo Brucato
Leon Derczynski
Hector Llorens
Kalina Bontcheva
Christian S. Jensen

Wow, it's super-deterministic!

Wow, it's super-deterministic!
Credit: Kevin Knight

… sometimes
● TempEval-2 timex recall: 66 – 88 %
● TempEval-2 normalisation: 55 – 85 %
● ~150 rules needed to get to 81% (Angeli &
Uszkoreit '13)
● We can get the structured expressions OK
● But what about the rest?

Unstructured time mentions
– Christmas
– Michelmas
– Halloween
– Easter
● Can we learn how to recognise these?

Time expression diversity
● Current corpora too small to hold much linguistic variation
● Note characteristic knee in distribution (cf. Montemurro)

Named Temporal Expressions
● New class of timexes
– Doesn't look like a timex
– Doesn't sound like a timex
– … is, in fact, a timex
X

How can we mine and extract NTEs?
● Expensive to annotate and hope they appear
● Prefer an automated approach
– > Let's mine Wikipedia!
● 432 English NTEs found

NTEs in Wikipedia
● Gives term and text description
● Problem: no good as a gazetteer, some entries
are polysemous (e.g. Carnival)
● Problem: recall limited with gazetteers
● Solution: build statistical tagger

Building statistical NTE tagger
● Use list of NTEs to annotate sentences
– CoNLL format, I/O binary labels
● Only use monosemous expressions
● Visit linked data searching for expressions
● If many entities found, expression is polysemous
– SELECT DISTINCT ?r {?r rdfs:label "carnival"@en}
– Not monosemous

Building statistical NTE tagger
● If a sentence contains a monosemous NTE,
also annotate any polysemous NTEs
● Assume that they will occur in temporal sense
While it might not have the retail significance
of Christmas, Halloween or Secretary's Day,
Groundhog Day remains perhaps the weirdest
American holiday.

NTE recognition results
● Baseline: gazetteer of timexes in existing
resources
● 2:1 train:eval split, strict matching evaluation
● Also found new NTEs!
– European Cup
– Dayton Peace Agreement

How do we normalise NTEs?
● Target representation: TIMEX3
– January 2nd, 1980 → 1980-01-02
– Summer 2012 → 2012-SU
– now → PRESENT REF
● Statistical learning won't manage
● Use dedicated tool, TIMEN
– Open normalisation toolkit
– Anyone can contribute
– SotA normalisation performance
– Takes a document with entity boundaries marked

Using NTE descriptions
● We have semi-structured descriptions
– “six weeks after Easter”
– “last Friday in June”
– “end of week 17”
– “tenth day of Tishrei”
● How to convert these to rules?

NTE normalisation rule extraction
● Create simple parser to cover majority of NTEs
– “June 25th”
– “Last Sunday in March”
● Covers 70.3% of NTE descriptions
● Remainder of rules may be added manually

Normalisation + NTEs
● Evaluation
● Two corpora:
– SotA (TempEval-3)
– Purpose built to be hard to normalise (TimenEval)
● On TempEval-3 (restricted newswire):
0.7% error reduction
● On TimenEval (varied genre):
4.3% error reduction

Outstanding issues:
Spatial variation
● Labo[u]r Day
– May 1 in much of the world
– first Monday in May in Australia's QLD and NT
● Summer
– Official vs. informal
– North vs. south

Outstanding issues:
Easter
● Commonly used as an
offset
● Non-trivial to determine
● “Computus”

Outstanding issues:
Multiple calendars
● Gregorian (Quite popular)
– Not particularly rational in the first place
● Lunar (China)
● Astrological
● Hebrew
● .. and so on

Outstanding issues:
Forms of expression
● Orthographic variation:
– Martin Luther King Day
– MLK Day
● Regional variation:
– autumn
– fall

Resources provided
● Corpus of NTEs
● Rules integrated into TIMEN in next release
– around November 2013

Thank you for your time!
Do you have any questions?

Recognising and Interpreting Named Temporal Expressions

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Leon Derczynski

Mais de Leon Derczynski (19)

Último

Último (20)

Recognising and Interpreting Named Temporal Expressions