2. Why starting with texts?
Domain knowledge cannot be fully automatically
extracted from texts
Texts are nevertheless useful
Texts are available data (≠ experts)
Texts partly reflect the domain conceptualisation (TBox)
Texts may contain pieces of factual knowledge (ABox)
Policy documents express business rules
It is often important to trace knowledge to textual sources
Natural Language Processing in ONTORULE
Acquiring knowledge from written policies
Enriching NLP tools with SBVR-based functionalities (metamodel
and SE)
Integrating policy documents into the management system
3. Text-based knowledge acquisition tools
• Terminae
Interactive acquisition of domain ontological
knowledge (conceptual vocabulary including
concepts, concept definitions, roles and some
instances)
• Semex
Combination of information extraction techniques and
manual modelling for the acquisition of rules
expressed in terms of the conceptual vocabulary
7. Building Lexicalized ontology from texts
• Goals
– Building a domain ontology
– Documentation
• Traceability to source documents
– Semantic annotation of source documents
• Query the text
8. Terminae
Extraction step
– Extract from the acquisition corpus the list of candidate terms using Term
Extractor tools
Normalisation step
– Filter and select relevant meanings of ambigious terms (clustering terms)
(i.e. member: airline participant/ customer)
– Create and structure termino-concepts (relevant and disambiguated
terms of the domain)
Formalisation step
- Create concepts and instances linked to each termino-concept
13. Semex
Rule acquisition
– Rule fragment selection
– Rule transformation
• Revision
– Normalisation of the vocabulary
– Syntax simplification
– Verbalisation of implicit statements
• Decomposition
Rule exploration
- Navigation interface
- SPARQL interface for advanced queries A project funded
by
14. Structure of the candidate rules
A project funded
by
Interlinked SBVR-SE statement
The temperature of the micro_slip_test must be greater than 15 C.