Presentation given by Thomas Scharrenbach at the 6th Uncertainty Reasoning for the Semantic Web Workshop at the 9th International Semantic Web Conference in 2010.
Paper: Default Logics for Plausible Reasoning with Controversial Axioms
Abstract: Using a variant of Lehmann's Default Logics and Probabilistic Description Logics we recently presented a framework that invalidates those unwanted inferences that cause concept unsatisfiability without the need to remove explicitly stated axioms. The solutions of this methods were shown to outperform classical ontology repair w.r.t. the number of inferences invalidated. However, conflicts may still exist in the knowledge base and can make reasoning ambiguous. Furthermore, solutions with a minimal number of inferences invalidated do not necessarily minimize the number of conflicts. In this paper we provide an overview over finding solutions that have a minimal number of conflicts while invalidating as few inferences as possible. Specifically, we propose to evaluate solutions w.r.t. the quantity of information they convey by recurring to the notion of entropy and discuss a possible approach towards computing the entropy w.r.t. an ABox.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Default Logics for Plausible Reasoning with Controversial Axioms
1. Default Logics for Plausible Reasoning with Controversial Axioms Thomas Scharrenbach (*) , Claudia d'Amato (**) , Nicola Fanizzi (**) , Rolf Grütter (*) , Bettina Waldvogel (*) and Abraham Bernstein (***) (*) Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Zürcherstrasse 111, 8903 Birmensdorf, Switzerland (***) University of Zurich Department of Informatics Binzmühlestrasse 14, 8050 Zurich, Switzerland (**) Università degli Studi di Bari Department of Computer Science Via E. Orabona, 4 - 70125 Bari Italy 4 th International Workshop on Ontology Dynamics – IWOD 2010
18. Justifications J 1 C = B A, C B, C Ø A J 2 D = B A, C B, C Ø A, J 3 E = B A, C B, C Ø A, D C D C, D C J 4 D = C B, D C, D Ø B J 4 D = C B, D C, D Ø B, E D Minimal sets of axioms that explain an entailment Root justifications do not depend on other justifications B A G F E D A C C B D C D B
19. Splitting justifications provide partitions for Default Logics B A, Unsat Splitting J 1 C = J 4 D = C B, C B, C Ø A D C, D Ø B Gamma-Set: unsat concept in signature Theta-Set: unsat concept not in signature Root justifications do not depend on other justifications B A G F E D A C C B D C D B
50. Slide 397: That's it for the moment... ... and thanks for your attention ... any feedback is greatly appreciated
51. Data Sustainability SQL GIS WS SQL GIS WS OWL OWL OWL data Domain Specific Search Domain Specific Search Domain Specific Search ta data meta meta data DNL Birmensdorf CSCF Neuchatel BAFU Berne
Welcome to my talk about „Plausible Reasoning with Controversial Axioms“. I am Thomas Scharrenbach and I will now present you joint work that I did with my colleages from WSL Rolf Grütter and Bettina Waldvogel, colleages from Bari, Claudia d'Amato and Nicola Fanizzi and last but not least, Avi Bernstein from the University of Zurich. This is a position paper, so I will not be able to present the underlying methods in too much detail. However, we are using this methods for ontology evolution, and it happens that I will present some of the methods used in this paper in more detail at the Workshop on Ontology Dynamics which takes place tomorrow.
Before I start with my talk, you might be wondering what forest, snow and landscape might have to do with plausible reasoning. Well, research at WSL is roughly categorized by these three topics. We, for example, create and maintain the Swiss national forest inventroy. Other groups try to figure out how to predict and prevent the occurrence of avalanches whereas people from my resaerch unit deal with the monitoring of endagered species but also spatial planning---also with regard to protect people from natural hazards. All these people at WSL have one thing in common: they produce tons of data. This can be in the scope of a research project. But we also collect and maintain data on the order of public authorities such as the Swiss Federal Office for the Environment.
For all these data we would like to have a formal meta-data description. This meta-data description, in turn is realized in OWL2-ontologies. When creating these ontologies, we face a common problem. There are two parties involved: Knowledge engineers and Domain Experts. The goal is to formalize the knowledge of the domain expert within an OWL2-ontology. Yet, Domain Experts tend to build highly inconsistent knowledge bases. On the other hand, Knowledge engineers build consistent knowledge bases but lack the detailed knowledge. Furthermore, if I start telling my colleagues at WSL about ontologies they look at me as if I were from Mars. Well, I am not from Mars and neither are you. At WSL, our strategy is to offer the Domain Experts a simple way of building ontologies which is tolerant to modelling errors. Furthermore, this procedure shall keep all information that was provided by the Domain Experts. We only revise the knowledge base from time to time, because Domain Experts do not like to find that pieces of the knowldege they contributed have just been deleted.
To sum up, when letting Domain Experts construct an ontology, we define some properties that we find useful: The ontology shall not infer anything unsatisfiable. Modelling errors are hidden from the Domain Experts. No information that was provided by the Domain Experts shall be deleted. We will not change the formalism for knowledge representation. If we started with OWL2-EL, for example, we want to end up with OWL2-EL. This refers ONLY to using OWL2 for knowledge representation, whereas we change the inference process, as we will see later on. Other properties are that the procedure shall work autonomous and preserve as much of the implicit information as possible. We require the first properties to be strictly kept whereas the last properties are considered as soft. This implies that we, for example, give higher precedence to explicitly stated axioms than to inferred knowledge.
To achieve these properties we defined the so-called Delta-transformation. This Delta-transformation maps a TBox to a so-called Default TBox. We invalidate unwanted inferences such as unsatisfiability by separating those axioms that cause unsatisfiability. For the separation we use Default Logics as interpreted by Lehmann. As done in Lukasiewicz Probabilistic Description Logics, we keep axioms that are not involved in a conflict in a separate TBox. We introduced a simple splitting scheme which allows to compute the partitions for the Default TBox without having to do additional satisfiability checks. We could further optimize the method by not transforming all trouble causing axioms. This potentially saves some inferences we would loose otherwise, but comes at the price that finding solutions becomes non-deterministic. However, following this optimization strategy alone has some disadvantage. There may still reside conflicts in the knowlege base. Although they do not cause any trouble when performing reasoning, we would like to get ird of them, because they might confuse the Domain Experts working with the ontology. The present work deals with exactly that problem: How can we overcome the uncertainty of different solutions regarding minimizing number of conflicts as well as the number of inferences invalidated.
Consider TBox axioms. These have the form B is subsumed by A where A and B are ---possibly complex---concepts. In Default Logics the set of axioms is partitioned into the partitions U_0, ..., U_N such that the most general information is contained in U_0 and the most specific axioms are conatined in U_N. The trick about Lehmann's Default Logics is that this order of specifity can be determined solely by the axioms themselves. How does that work? I will explain this to you by an example TBox that infers some concepts unsatisfiable. You can find this example also in the paper. We introduce a remainder set D_n in which we store all currently valid axioms. In the beginning the first remainder set D_0 is the TBox itself. For the first partition, we now take all axioms for which both the subconcept as well as the superconcept are satisfiable. TODO ExAMPLE
I will now give you a quick overview over the unsat splitting. First of all, we work on root unsat justifications. An unsat justification is a minimal set of axioms that explains an unsatisfiability. A root unsat justification does not depend on any other justification. Assume the simple TBox in the upper right corner. Black arrows stand for concept subsumption whereas red arrows represent disjoints. For example, this arrow means A is subsumed by B whereas this arow means D is disjoint with B. We separate the root unsat justifications such into two sets: The Gamma-set, in red, contains all axioms that contain the concept that is unsatisfiable w.r.t. to this very justification. The Theta-set, in blue, contains the remaining axioms of that very justification. We can now use this simple splitting to compute the Default TBox without any further unsatisfiability checks. All the satisfiability checks have already been done by computing the justifications.
Consider TBox axioms. These have the form B is subsumed by A where A and B are ---possibly complex---concepts. In Default Logics the set of axioms is partitioned into the partitions U_0, ..., U_N such that the most general information is contained in U_0 and the most specific axioms are conatined in U_N. The trick about Lehmann's Default Logics is that this order of specifity can be determined solely by the axioms themselves. How does that work? I will explain this to you by an example TBox that infers some concepts unsatisfiable. You can find this example also in the paper. We introduce a remainder set D_n in which we store all currently valid axioms. In the beginning the first remainder set D_0 is the TBox itself. For the first partition, we now take all axioms for which both the subconcept as well as the superconcept are satisfiable. TODO ExAMPLE
I will omit the details of the algorithm for creating partitions, since it is not relevant for this wowk. For the simple TBox in the example, we receive the following TBox: We have three partitions, U0, U1 and U2 and a so-called Universal TBox T-Delta. Each partition together with T-Delta is coherent. For the reasoning part we could, in principle, do Default Logics reasoning, but we would like to avoid the additional complexity. Our goal is to simply ignore conflicts. As such we consider the union of all deductive closures as the deductive closure of the whole Default TBox. ---Pause--- There is another issue we did not address so far. We put all axioms form the root justifications into the partitions and receive a single unique solution. Well, we can do better:
It suffices to put only two axioms of each root unsat justification into two different partitions. In that case we showed that we loose less inferences. We can, for example, put axiom D disjoint with B into the Universal TBox. It now occurs not only in partition U2, but in all partitions. The choice, however, which axioms to put into the partitions and which axioms to put into the Universal TBox is non-deterministic. Optimizing, hence, introduces some uncertainty into the whole process.
A second example would be choosing not to transform axioms C disjoint with A and C subsumed by D but put them in the Universal TBox instead.
To sum up, we first identify all the axioms that are involved in conflicts by computing the unsat justifications. We resovle the root unsat justifications by using methods from Lehmann's Default Logics and Lukasiewicz Probabilistic Description Logics. In particular, some ofthe trouble-causing axioms are separated during reasoning. We finally have to optimize the actual choice which axioms to put in the partitions and which axioms can be left in the Universal TBox. This last step effectively introduces some uncertainty for the reasoning capabilities of the resulting Default TBox which we have to overcome.
If we just perform inference counting we can end up with weird situations. Consider the following Default TBox which was the first optimized solution I just presented. To our definition, this Default TBox DT-1 has two entailments: On the one hand we infer that B is subsumed by D on the other hand, we infer the opposite. This contradiction does not cause problems when reasoning. We never consider both entailments at the same time, because they originate from two different partitions. Yet, it is not quite obvious for a Domain Expert that we allow for such conflicts. Classical Default Logics reasoning could overcome this issue, but as I said, we want to avoid the additional overhead. Fortunately, in this example we can provide a solution in which the conflict is no longer present. The second possible solution DT-2 does not contain the conflict. However, this means that is has one inference less than DT-1. If we now just assessed solutions by inference counting, then we would clearly choose solution one, that is the one containing the conflict. Hence need a performance measure that takes into account the quality of a solution regarding the number of conficts still present.
We recently came up with the idea of assessing the quality of a possible solution by its information content. In computer science, in particular in information theory, information content is measured by the entropy. How can we benefit from an entropy based measure? Well, assume that a conflict is still present. That is, we are still able to infer that B is subsumed by D as well as the complement of B is subsumed by D. If we assert an instance to D, then we infer two assertions: One for B and one for the complement of B. In case there is no conflict, we can infer the assertion to only one of both concepts, that is to either B or its complement. Considering the assertions as a random variable, the conflict case is more similar to an uniform distribution whereas the non-conflicting case is more different from the uniform distribution. The entropy, in turn, measures how much the actual distribution of a random variable differs from the uniform distribution. The higher the entropy the larger the difference. We hence propose to assess solutions regarding the entropy of a Default TBox in presence of an instantiation, that is an ABox.
To define the entropy on a TBox or a Default Tbox---the procedure works on any set of axioms for which we have a proper inference mechanism---we define the entropy on a set of axioms. This requires to define a probability mass function on axioms. Based on a concept representation of an axiom, we propose to use the following definition: The probability of an axiom is the normalized sum of the instances under which the axiom becomes true. This is the case when the instance i can either be asserted to A or to the complement of B. Alpha serves here as the normalization constant and I is an indicator function. As I stated before the idea is that presence of conflicts increase the entropy whereas absence of conflicts reduces them.
There have been made proposals measuring the quality of an ontology. Most of these have been desigend for evaluation the quality of ontology modularization. Modularization, in contrast to Default Logics, tries to split up the ontology into independent sub-units. We try to avoid this independence as much as possible. However, the most relevant to this work is, to the best of our knowledge, the entropy measure defined by Doran et al. It solely relies upon the structure of the ontology. We could, in principle, treat the different partitions as modules and hence apply this measure. Yet it would not be useful for minimizing the number of conflicts. We still have to evaluate whether other measures can do the job, but we strongly assume that this is not the case, because none of these measures was designed for minimizing the number of conflicts.
To conclude, we are indeed able to perform reasoning on ontologies that contain controversial axioms. We can keep all explicitly stated information and ignore potential conflicts. The solution to reasoning is not deterministic but is subject to an optimization process. Conflicts are ignored during reasoning but can still be present. We hence have ot optimize regarding the number of conflicts and the number of inferences that we invalidate. We showed that this requires a more sophisiticated measure than simply counting inferences. Existing methods for evaluating ontologies were not designed to assess Default Tboxes regarding a minimal number of conflicts. We suggest to assess the quality of solutions by their information content and proposed an entropy-based measure that evaluates the quality of a Default TBox solution in the presence of a TBox.
There is still plenty of work to do. First of all, we have to improve the scalability of computing all justifications for an entailment. Most relevant to this work, we will have to investigate further quality measures to have a solid basis for choosing the proper measure. We also have to evaluate this approach on real people. It is not granted that Domain Experts will accept the approach. Last but not least, we may apply this method to other domains such as, for example, ontology mapping.
So, I will come to an end now, before you are all die of powerpoint-poisoning. I thank you for your attention and would appreciate your comments and questions.