Final Presentation for my MSc Graduation Project.
Abstract:
"Semantic annotation uses human knowledge formalized in ontologies to enrich texts, by providing structured and machine-understandable information of its content. This paper proposes an approach for automatically annotating texts of the Cyttron Scientific Image Database, using the NCI Thesaurus ontology. Several frequency-based keyword extraction algorithms were implemented and evaluated, aiming to extract important concepts and exclude less relevant ones. Furthermore, topic classification algorithms were applied to identify important concepts which do not occur in the text. The algorithms were evaluated by comparison to annotations provided by experts. Semantic networks were generated from these annotations and an ontology-based similarity metric was applied to perform the comparison. Finally the networks were visualized to provide further insights into the differences of the semantic structure generated by humans, and the algorithms."
More information: http://graus.nu/category/thesis
5. What » How » Why
Example
“ Company XYZ announced profits in Q3, planning to
build a $120M plant in Bulgaria.”
6. What » How » Why
It is like tagging
“ Company XYZ announced profits in Q3, planning to
build a $120M plant in Bulgaria.”
7. What » How » Why
It is not tagging
“ Company XYZ announced profits in Q3, planning to
build a $120M plant in Bulgaria.”
Tags:
Company XYZ
Plant
Bulgaria
8. What » How » Why
It is not tagging
“ Company XYZ announced profits in Q3, planning to
build a $120M plant in Bulgaria.”
Meaning:
What is Company XYZ?
What is a Plant?
What is Bulgaria?
9. What » How » Why
It is not tagging
“ Company XYZ announced profits in Q3, planning to
build a $120M plant in Bulgaria.”
Meaning:
What is Company XYZ?
What is a Plant?
What is Bulgaria?
How do they relate?
10. What » How » Why
It adds context!
source: ontottext.com
11. What » How » Why
What?
Automatic Semantic Annotation of the Cyttron Database
13. What » How » Why
Cyttron Database
"The volume of the brain evaluated in this
study. The color scale represents the
number of 4-mm voxels with data in at
least 7 subjects along a 3-cm deep line
into the brain. A three-dimensional
rendering of a brain is shown in regions
where insufficient data were obtained. The
most superior regions of the frontal and
parietal lobes and the most inferior
regions of the temporal lobes were not
evaluated. Imaging artifacts may also
compromise the significance of results in
the most inferior portions of the frontal
lobe."
17. What » How » Why
NCI Thesaurus
Definition: An organ composed of grey and white matter
containing billions of neurons that is the center for
intelligence and reasoning. It is protected by the
bony cranium.
18. What » How » Why
NCI Thesaurus
Context:
Brain is a Central Nervous System Part
Brain is a Organ
Brain part of Central Nervous System
Basal Ganglia part of Brain
Base of the Brain part of Brain
Brain Nucleus part of Brain
27. What » How » Why
Evaluation
I I I I I I
I I I I I I
I I I I I I
I I I I I I
II II II II II
28. What » How » Why
Evaluation
1 I I I I I I
2 I I I I I I
3 I I I I I I
I I I I I I
II II II II II
29. What » How » Why
Evaluation
?
1 I I I I I I
2 I I I I I I
3 I I I I I I
I I I I I I
II II II II II
30. What » How » Why
Evaluation I
Confusion Matrix
31. What » How » Why
Evaluation II
Semantic Similarity
32. What » How » Why
Evaluation II
Semantic Similarity
Human Sagittal Plane
Brain Magnetic Resonance Imaging
Magnetic Resonance Imaging Cingulate Gyrus
Cingulate Gyrus Corpus Callosum
Lateral Ventricle
Thalamus
Mamillary Body
Cerebral Fornix
White Matter
33. What » How » Why
Evaluation II
Semantic Similarity
Human Sagittal Plane
Brain Magnetic Resonance Imaging
Magnetic Resonance Imaging Cingulate Gyrus
Cingulate Gyrus Corpus Callosum
Lateral Ventricle
Thalamus
Mamillary Body
Cerebral Fornix
White Matter
37. What » How » Why
Results
1. No ‘agreement’ between experts
2. Annotation method I best approach
3. Both Annotation II & Random had no direct matches
38. What » How » Why
What is it good for? / Future Work
1. Domain independent method
2. Clustering topic identification
3. Subgraph similarity