1. Issues in Learning an
Ontology from Text
Christopher Brewster, Simon Jupp, Joanne Luciano, David
Shotton, Robert Stevens, and Ziqi Zhang
2. The Use Case: Animal Behaviour
• Animal behaviour community
recognises the need for an
ontology, e.g. for video
annotation/retrieval
• The community created an
“Animal Behaviour Ontology” -
339 terms
• Can we (semi-) automatically
build from text?
3. Some Questions
• Do we get a “good ontology”?
• If not, is it useful?
• Is it low-effort?
• Should the result be “tidied up” or used as a
donor?
4. Methodology: Dataset
• Journal “Animal Behaviour” from Elsevier
• 623 articles from Vol 71 (2006) - Vol 74 (2007)
• 2.2 million words
• Various formats - most usefully xml
5. We Want an Ontology of Green
• An ontology of “animal behaviours”
• Not an ontology of the corpus
We want the
green terms in
the ontology
6. Processing Steps (1)
1. Text extracted from XML - excluding affiliations,
acknowledgements, bibliography except for title
etc.
2. Noise removed - person names, animal names,
place names
3. Lemmatiser used to reduce data sparsity
4. Term extraction applied
7. Processing Steps (2)
5. Term selection
Regular expression used to select
terms ending in behaviour, display,
construction, inspection plus generic
-ing, -ism, etc.
Build hierarchies using String Inclusion
5. Top level terms filtered using “Hearst
Patterns” to test if X ISA
behaviour/activity/etc.
Walking
Running
Jumping
Hunting
Pecking
Reed Bunting
Corn Bunting
Herring
Courtship
Studentship
Cannibalism
Dimorphism
8. Applying String Inclusion /Rules to
Terms
C
BCAC
ABC
Selection
Mate Selection
Natural Selection
Female Mate
Selection
9. Lexico-Syntactic Patterns
• X such as P, Q, R; X is a Y
• Grooming is a behaviour
• Copulation is an activity
• Dimorphism is a behaviour
• Calls such as trills, whistles, grunts
10. Results
• 64,000 terms extracted
• The regexp selected 10,335 terms
• Step 6a resulted in an ontology with 17,776
classes and 1295 top level classes
• Step 6b resulted in an ontology with 13,058
classes and 912 top level classes
12. Results(3)
• Evaluation of terms excluded by regexp:
• 56,000 terms excluded
• Random sample of 3140 terms evaluated by hand
• 7 verbs and 42 nouns should not have been excluded
• E.g., “interaction”
• A recall of .905
14. Other Issues
• More a vocabulary than an ontology
• SKOS-like rather than OWL-like
• Can deal with “selection”, “mate selection” and
“natural selection
• Highly compositional terms “Adult male
grooming behaviour”
• Cleanish list of top level terms: Canabalism,
copulation, eating, foraging, fighting, grooming
15. Discussion: Is it useful?
• Answers: No, yes, yes, donor
• Useful ontological fragments
• Bringing ontology to ontology learning is the research
challenge
• Limitations: noise; the problem of focus; only
taxonomic relations
• Advantages: speed; ease; a step towards formal
ontologies
Editor's Notes
copulation --> grandfather copulation, cannibalism copulation, harassment copulation, inferred copulation, long copulation, palp copulation, elements copulation, behavioural elements copulation, face to copulation