Paper: http://derczynski.com/sheffield/papers/named_timex.pdf
This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corpora – for example Michaelmas or Vasant Panchami.
5. … sometimes
● TempEval-2 timex recall: 66 – 88 %
● TempEval-2 normalisation: 55 – 85 %
● ~150 rules needed to get to 81% (Angeli &
Uszkoreit '13)
● We can get the structured expressions OK
● But what about the rest?
7. Time expression diversity
● Current corpora too small to hold much linguistic variation
● Note characteristic knee in distribution (cf. Montemurro)
8. Named Temporal Expressions
● New class of timexes
– Doesn't look like a timex
– Doesn't sound like a timex
– … is, in fact, a timex
X
9. How can we mine and extract NTEs?
● Expensive to annotate and hope they appear
● Prefer an automated approach
– > Let's mine Wikipedia!
● 432 English NTEs found
10. NTEs in Wikipedia
● Gives term and text description
● Problem: no good as a gazetteer, some entries
are polysemous (e.g. Carnival)
● Problem: recall limited with gazetteers
● Solution: build statistical tagger
11. Building statistical NTE tagger
● Use list of NTEs to annotate sentences
– CoNLL format, I/O binary labels
● Only use monosemous expressions
● Visit linked data searching for expressions
● If many entities found, expression is polysemous
– SELECT DISTINCT ?r {?r rdfs:label "carnival"@en}
– Not monosemous
12. Building statistical NTE tagger
● If a sentence contains a monosemous NTE,
also annotate any polysemous NTEs
● Assume that they will occur in temporal sense
While it might not have the retail significance
of Christmas, Halloween or Secretary's Day,
Groundhog Day remains perhaps the weirdest
American holiday.
13. NTE recognition results
● Baseline: gazetteer of timexes in existing
resources
● 2:1 train:eval split, strict matching evaluation
● Also found new NTEs!
– European Cup
– Dayton Peace Agreement
14. How do we normalise NTEs?
● Target representation: TIMEX3
– January 2nd, 1980 → 1980-01-02
– Summer 2012 → 2012-SU
– now → PRESENT REF
● Statistical learning won't manage
● Use dedicated tool, TIMEN
– Open normalisation toolkit
– Anyone can contribute
– SotA normalisation performance
– Takes a document with entity boundaries marked
15. Using NTE descriptions
● We have semi-structured descriptions
– “six weeks after Easter”
– “last Friday in June”
– “end of week 17”
– “tenth day of Tishrei”
● How to convert these to rules?
16. NTE normalisation rule extraction
● Create simple parser to cover majority of NTEs
– “June 25th”
– “Last Sunday in March”
● Covers 70.3% of NTE descriptions
● Remainder of rules may be added manually
17. Normalisation + NTEs
● Evaluation
● Two corpora:
– SotA (TempEval-3)
– Purpose built to be hard to normalise (TimenEval)
● On TempEval-3 (restricted newswire):
0.7% error reduction
● On TimenEval (varied genre):
4.3% error reduction
18. Outstanding issues:
Spatial variation
● Labo[u]r Day
– May 1 in much of the world
– first Monday in May in Australia's QLD and NT
● Summer
– Official vs. informal
– North vs. south
20. Outstanding issues:
Multiple calendars
● Gregorian (Quite popular)
– Not particularly rational in the first place
● Lunar (China)
● Astrological
● Hebrew
● .. and so on
21. Outstanding issues:
Forms of expression
● Orthographic variation:
– Martin Luther King Day
– MLK Day
● Regional variation:
– autumn
– fall