SlideShare a Scribd company logo
1 of 30
Download to read offline
Prepublication draft of Webb, G. I. (2002). Integrating Machine Learning with
Knowledge Acquisition. In Leondes, C. T. (Ed.), Expert Systems, volume 3, pages 937-
959. San Diego, CA: Academic Press.



Techniques for integrating machine learning with knowledge acquisition

Geoffrey I. Webb

School of Computing and Mathematics

Deakin University

Geelong, Vic, 3217,

Australia

Tel: +61 3 5227 2606

Fax: +61 3 5227 2028

Email: webb@deakin.edu.au

INTRODUCTION

Knowledge acquisition is frequently cited as the greatest bottleneck in the
development of expert systems. Two primary approaches to knowledge acquisition
are elicitation of knowledge from experts (traditional knowledge acquisition) and
machine learning. Both approaches have strengths and weaknesses. Experts can draw
upon practical experience and both domain specific and general knowledge. In
addition, they often have access to well established procedures relating to the target
domain. Elicitation of knowledge from experts is limited, however, by the difficulty of
articulating knowledge; reluctance in some circumstances to share knowledge;
preconceptions and biases that might inappropriately influence an expert; and the
limits of an expert’s knowledge. In contrast, machine learning systems provide a
capacity to infer procedures from examples; can perform extensive logical analysis;
and are not subject to the same types of preconceptions and biases as an expert. They
are hampered, however, by limited access to general and domain-specific knowledge
and the difficulties of obtaining comprehensive example sets. Further, machine
learning is only possible where much of the knowledge acquisition task has already
been completed. Machine learning requires a description of the problem domain and
collection of example cases from which to learn. In attribute-value machine learning,
the domain description consists of a set of attributes and their allowable values along
with a collection of class values. These, together with the formalism used for
expressing the decision procedures, specify a space of possible solutions that the
system might ‘learn’. The learning system then explores this space of solutions seeking
one that best fits the training examples. If a suitable space of possible solutions is
specified, learning is relatively straightforward. If a poor solution space is specified,
effective learning is impossible. Thus, machine learning requires prior ontological
analysis and the specification of a suitable class of models to explore.




                                            1
It is clear that this pre-learning process is crucial. A domain expert will usually
provide critical input into this process. This is not the end of the domain expert’s
involvement in knowledge acquisition by machine learning, however. In practical
applications of machine learning it is often necessary for an expert to review the rules
that a machine learning system creates (see, for example, Buntine & Stirling, 1991).
These rules may require modification due to pragmatic or other considerations quite
outside the formal factors that a machine learning system is able to consider. For
example, social or ethical considerations may make unacceptable a policy to refuse
loans to a specific class of loan applicant, no matter how compelling the evidence is
that the particular class of applicant represents a poor risk. Review by the expert will
also frequently lead to the identification of deficiencies in the domain specification,
which will necessitate a cycle involving modification of the domain specification and
re-application of machine learning, followed by another domain expert review. Most
machine learning tools provide little support for these interactions with the expert.

In addition to assisting the expert in processes that must be carried out to support
machine learning, there is also a need for facilities to enable the expert to directly assist
the machine learning system during the induction process.

However, there are a number of obstacles to providing such support. The first of these
is communication. Any sophisticated interaction between an expert and a machine
learning system will require the two to communicate with one another. But, such
communication requires mechanisms beyond those normally supplied by machine
learning systems.

A second difficulty is that the machine learning system usually provides a very
shallow knowledge-base.          Without any means of obtaining explanations and
justifications of the rules that are presented, an expert can find these shallow rules
difficult to assimilate into his or her more sophisticated models of the domain.

Interactions are further limited by the restricted forms of input to which most machine
learning systems are restricted. Most attribute-value systems allow only the
description of the attributes and of a set of examples. There is no provision for
background knowledge or any other suggestions or constraints with which to guide
the learning process.

A final obstacle is that machine learning systems can make quite arbitrary choices in
situations where there are alternative solutions that equally well fit the example cases.
Such situations are quite frequent in practice. These arbitrary choices may be
acceptable in such situations if no other information is available and maximisation of
predictive accuracy is the primary consideration, as is usually considered the case in
machine learning research. However, an expert will often be able to discriminate
between the alternatives on other grounds. In this context, the systems’ arbitrary
choices may reduce the comprehensibility or acceptability of the inferred rules for the
expert or may lead to less useful rules than if the expert’s judgements are utilised.

A large number of systems and techniques have been designed to tackle aspects of
interaction between a user and a machine learning system. Most of these are oriented
toward interactive use by a sophisticated knowledge engineer (Attar Software, 1989;
Davis & Lenat, 1982; De Raedt, 1992; Morik, Wrobel, Kietz & Emde, 1993; Nedellec &
Causse, 1992; O'Neil & Pearson, 1987; Schmalhofer & Tschaitschian, 1995; Shapiro,




                                              2
1987; Smith, Winston, Mitchell & Buchanan, 1985; Tecuci & Kodratoff, 1990; Wilkins,
        1988).

        This paper describes The Knowledge Factory, a computer-based environment that
        allows a domain expert to directly collaborate with a machine learning system at all
        stages of the knowledge acquisition process. We have endeavoured to take the above
        considerations into account in designing this system. It is distinguished from previous
        such systems by its orientation toward direct use by domain experts with minimal
        computing sophistication.

        THE KNOWLEDGE REPRESENTATION SCHEME

        Sophisticated knowledge representation schemes, such as first-order logic, are not
        appropriate for use by non-knowledge-engineers (see, for example, Kodratoff & Vrain,
        1993). Extensive training and experience is required to master such formalisms.
        Instead, The Knowledge Factory uses simple attribute-value case descriptions and
        production rules. Example cases are described by simple vectors of attribute values.
        Rule antecedents (conditions) are conjunctions of simple tests on attributes. The
        consequents (conclusions) are simple classification statements.

Fig 1   Examples of rules are provided in Fig. 1. These examples illustrate rules learned from
about   data on the diagnosis of renal disease used in research with Geelong Hospital (Agar &
 here   Webb, 1992). Attributes are either categorical or ordinal. For categorical attributes, set
        membership tests are used (although set notation is avoided). All tests in the first rule
        in Fig. 1 relate to categorical attributes. For ordinal attributes, tests specify a range of
        allowable values. The attributes age and prodocytic_fusion_ef tested in the second rule of
        Fig. 1 are both ordinal. The condition part of a rule (the tests between the keywords IF
        and THEN) is satisfied for a case if all of the individual tests are satisfied. If the
        condition is satisfied then the conclusion for the rule is taken to apply to that case.

        One complication that must be handled by a machine learning system is missing
        values in the data. The Knowledge Factory recognises two types of missing value,
        unknown and unobtainable. The latter is treated as a distinct value, that may be tested in
        a rule, unobtainable having the same status as a normal value. The first test in the third
        rule in Fig. 1 allows unobtainable values for the attribute age. Any case with an
        unobtainable value for age will fail the condition for the second rule, which does not
        explicitly allow unobtainable values for this attribute. Unobtainable values should be
        used where the absence of a value needs to be considered during the decision process,
        for example, because different attributes should be considered if an attribute normally
        used in the decision making process does not have a value. In contrast, unknown
        values pass any test on an attribute. In consequence, the test for the second rule in
        Fig. 1 will succeed for a case with an unknown value for age, so long as all of the tests
        on other attributes succeed. The condition on hepatitus_B_antibody in the second rule of
        Fig. 1 will only succeed for a case with an unknown value.

        A summary line follows the consequent of each rule. This displays the number of
        training examples that the rule correctly classifies (Positive), the number incorrectly
        classified (Negative) and the value currently assigned to the rule by the system (used,
        under some settings, when resolving between multiple rules that cover a case).

        In the current version of the system, the rule sets are flat. All rules consequents relate
        to the same attribute, called the decision attribute. This attribute cannot be used in the



                                                      3
antecedent of a rule. While it might be appropriate to slightly relax these constraints, it
would not be appropriate to allow complex chaining to any great depth, as non-
knowledge-engineers are unlikely to readily master the complexities of such a rule
base. Attempts to incorporate more sophisticated knowledge representation devices,
such as model construction operators (Morik, Wrobel, Kietz, & Emde, 1993), are also
viewed as likely to impose an undesirable barrier to use for those without extensive
training in knowledge engineering.

We have conducted informal experimentation with two sets of experts—medical
practitioners and financial analysts. The former have had little computing expertise
and the latter have had substantial computing expertise but no background in
knowledge-engineering. These experiments show that these experts adapt readily to
this form of knowledge representation. They have no difficulty in interpreting existing
rules, or specifying new rules or examples.

MACHINE LEARNING TECHNIQUES

The Knowledge Factory uses the DLGref learning algorithm (Webb, 1993). This is an
attribute-value classification learning algorithm that supports refinement of existing
rules. Attribute-value classification learning algorithms take as input a training set of
pre-classified examples. Each of these examples is described by a vector of attribute-
values and is labelled with the correct class. The algorithm forms a model that relates
combinations of attribute-values to classes. Note that while this abstract description of
machine learning describes the models that it produces as classifiers, classification can
be used to model many forms of discrete decision making. Thus, classification
learning encompasses many useful learning tasks.

Generalisation, Specialisation, and Cover

The Knowledge Factory makes extensive use of two operations on rules: generalisation
and specialisation. The following provides a brief introduction to these operations.

The classification rules used by The Knowledge Factory have the form IF condition
THEN classification. If the condition of a rule is satisfied by a case then the rule is said
to cover that case.

A rule x is a generalisation of a rule y if x necessarily covers all cases that y covers. For
example, the rule IF age>20 THEN old is a generalisation of IF age>30 THEN old. A case
covered by the second rule must also be covered by the first rule. To create a
generalisation of a rule in the language used by The Knowledge Factory it is necessary
to either make a constraint on an attribute less strict, for example by lowering the
minimum value allowed as in the example above, or remove a constraint on an
attribute, as with the generalisation from IF age>20 and hair is fair THEN x to IF age>20
THEN x .

If rule x is a generalisation of rule y, then y is a specialisation of x.

Note that by these definitions, a rule is both a generalisation and a specialisation of
itself. A rule x is a proper generalisation of a rule y if x is a generalisation of y and x is
not a specialisation of y. If x is a proper generalisation of y then y is a proper
specialisation of x.




                                                  4
A rule x is a least generalisation (Plotkin, 1970) with respect to rule y and case z, if x is a
generalisation of y and x covers z, and there is no rule r that is a proper specialisation
of x, a generalisation of y and covers z. For example, IF age>20 THEN old is a least
generalisation of IF age>30 THEN old with respect to a case with age 20.

A rule x is a least specialisation with respect to rule y and case z, if x is a specialisation of
y and x does not cover z, and there is no rule r that is a proper generalisation of x, a
specialisation of y and does not cover z. For example, IF age>30 THEN old is a least
generalisation of IF age>20 THEN old with respect to a case with age 29 (assuming that
only integer values are allowed for age).

DLGref

DLGref is a variant of the AQ (Michalski, 1993) learning algorithms. These algorithms
form a set of classification rules incrementally. For each class a rule is sought that
performs best on the remaining examples. Best performance is judged in a variety of
ways, but usually involves covering as many examples of the class and as few
examples not of the class, as possible. DLGref allows the user to specify a metric of
good performance. Within The Knowledge Factory, two metrics are supported. The
max-consistent metric does not accept any rules that cover examples of another class
and prefers of the remaining rules those which cover the most examples of the positive
class. Laplace is based on the Laplacian law of succession. It allows a trade-off between
covering greater numbers of examples and covering small numbers of counter-
                                                          p+1
examples. Using this metric, the value of a rule equals n+2 , where p is the number of
positive examples covered by the rule and n is the total number of examples covered
by the rule. Having selected the best rule, all examples that it correctly classifies are
removed from the training set and the process is repeated. The rules so selected are
pooled together to form a set of classification rules.

The DLGref algorithm is presented in Appendix A.

Whereas most AQ-like covering algorithms search for each rule starting from very
general rules and considering successive specialisations thereof, DLGref starts from
highly specific rules and explores successive generalisations.          The successive
generalisations are generated using least generalisation. This process readily supports
ordinal and continuous valued attributes, forms that cause difficulties for the
specialisation-based search of traditional AQ algorithms.

Information Utilised by the Machine Learning System

An attribute-value classification learning system utilises two major types of
information. The first is the set of attributes and their possible values. This, along with
the formalism for expressing classifiers that the system employs, defines the set of
possible classifiers that the system can form. Clearly, the system cannot form
classifiers that utilise attributes about which the learning system has not been
informed. The second type of information that is used is the training set. The learning
system searches the space of possible classifiers that it is capable of forming, seeking a
classifier that best classifies the examples in the training set. Clearly, the system is
greatly influenced by the quality of the training set with which it is provided. If there
are intricacies to classification in a domain that are not represented in the training set
then the learning system cannot learn them.



                                                5
TECHNIQUES

One of the primary design considerations for The Knowledge Factory was to allow
direct interaction between the knowledge acquisition system and an expert with
minimal computing expertise. This has been achieved by a number of previous
knowledge acquisition systems, notably repertory grid based tools such as ETS (Boose,
1986) and the ripple-down-rules tools of Compton, Edwards, Srinivasan, Malor,
Preston, Kang, & Lazarus (1992). However, a significant difference between The
Knowledge Factory and these previous tools is that The Knowledge Factory’s
incorporation of machine learning frees it from a reliance on the expert having all
encompassing knowledge of the domain. Previous systems have relied upon the
expert being able to specify a suitable solution for any problem that the system may
present. The expert, then, is viewed as all-knowing. All that is required is to help him
or her to develop a formal model of the domain that captures the relevant aspects of
his or her expertise. With the inclusion of machine learning, this view can change.
Now the expert is viewed as having valuable insight into the domain, but this insight
need not be all encompassing. Where the expert is unable to provide suitable models,
the machine learning system can supply models instead. But the interactions may be
more sophisticated than each simply producing part of a model, the two parts then
being assembled.

An alternative scenario is that an expert may have expertise in some aspect of a
domain but may not be able to express formal models that encode that expertise. That
is, the expert may be able to solve a particular class of problems, but may not be able to
specify a set of expert system rules to reproduce those solutions. In this case, the
machine learning system can be used to produce a ‘first draft’ of a set of rules. The
expert may then be able to critique and refine those rules. Thus, machine learning can
be used to ‘kick-start’ the model specification and refinement process.

A further scenario is that even after the use of machine learning to produce a first draft,
the expert may still be unable to specify appropriate formal models. In this case, he or
she may be able to critique the machine learning system’s models, enabling it to further
refine its own models.

Finally, the expert may be able to specify some formal models initially, but these
models may be incomplete or imprecise. The machine learning system may then refine
this initial knowledge-base. The expert can review and revise the machine learning
system’s contributions in turn, starting an ongoing sequential process of refinement.

The commonalities of all these scenarios are an underlying view of the expert and the
machine learning system as partners in the knowledge acquisition task, each of which
has differing insights into a domain. The expert’s insights are derived from training,
experience and both general and commonsense knowledge. The machine learning
system’s insights are derived from analysis of a set of training examples.

Communication Mechanisms

Such pooling of insight clearly requires communication between the partners of a form
not normally supported by machine learning systems. The Knowledge Factory uses
example cases as the primary basis for communication.                This case-based
communication paradigm suits both partners in the collaboration. Experts in many
domains are familiar with the use of examples, both to have points illustrated and for



                                             6
illustrating points. For a machine learning system, example cases are the major source
        of evidence that is considered. Many of the decisions that a machine learning system
        makes can only be explained in terms of how a model relates to the provided training
        examples. With The Knowledge Factory’s case-based communication, each partner
        uses example cases to communicate with the other.

        Example windows

        The primary mechanism used by the machine learning system is to provide windows
        that list the example cases that stand in a particular relationship to a rule or set of rules.
        Eight of these example windows are maintained. While these windows could be
        composed to answer specific queries from the user, or even to anticipate queries that
        the user might wish to pose, the complexity of creating such queries, and the
        difficulties of understanding system generated summaries of the queries that an ad hoc
        window might address, mitigate against the use of such mechanisms with our target
        users. Rather, windows that address the issues that are most likely to concern the
        users are maintained at all times. This constant display allows a user to familiarise
        himself with the meaning of each window, not having to devote attention to
        comprehending the meaning of a window each time that it is considered, as would be
        the case if windows presenting different types of information were generated
        dynamically by the system. The user is then able to attend to individual windows at
        his own discretion, ignoring those that do not bear upon his current concerns and
        utilising those that do.

        The first example window simply lists all available examples. It is perhaps stretching
        the scope of the technique to include this window within the case-based
        communication mechanism, as, in contrast to the other example windows, it does not
        really serve a communication purpose. Rather, it provides the user with ready access
        to the available example cases.

        The Indistinguishable Examples Window contains all examples for which there is an
        example from another class such that the system is not able to form a rule that
        distinguishes the two. This will occur if there is no attribute for which the two
        examples in question have different values. Unknown values do not count as distinct
        for this purpose. The presence of examples in this window indicates that it is not
        possible to form rules that correctly classify all examples given the current set of
        attributes. This indicates to the user that it might be desirable to consider revision of
        the current set of attributes.

        Whenever a rule from the current knowledge base is selected, the Positive Examples
        Window lists all examples that the rule correctly classifies. The Counter Examples
        Window displays all examples that the current rule incorrectly classifies. The
        Uncovered Examples Window lists all examples that belong to the class indicated by
        the rule’s conclusion, but which are not covered by the rule. These windows indicate
        to the user the evidence that led the system to select the current rule. They can be used
        both to aid the user’s understanding of the underlying support for the rule and to help
        the user to formulate alternatives to the rule that serve the same purpose. They also
        provide a simple mechanism for the user to ask what if? and why not? questions. To
        determine why the system did not select a specific alternative to a rule, the user need
        only create and select the other candidate. If it was not created because it failed to
        cover specific examples, this will be indicated by the contents of the Positive Examples
        Window. If the failure to create it was due to the alternative rule incorrectly classifying
Fig 2
about
 here
                                                       7
specific cases, this will be indicated by the contents of the Counter Examples Window.
        Finally, they provide a mechanism whereby the machine learning system can critique
        rules that the user formulates. If a rule fails to cover example cases, this will be
        indicated by the Uncovered Examples Window. If a rule misclassifies cases, this will
        be indicated by the Counter Examples Window. Finally, if the rule performs well, this
        will be indicated by the Positive Examples Window. The operation of these windows
        is illustrated in Fig. 2. Here, a rule for diagnosing Immunoglobulin A Deficiency
        (IGA_NX) is selected. This rule has been learned by the machine learning system. The
        Positive Examples Window displays the examples that support the rule. The
        Uncovered Examples Window displays the IGA_NX cases that the rule fails to cover.
        The empty Counter Examples Window shows that this rule does not misclassify any
        examples.

Fig 3   To illustrate the use of these windows to answer why not? questions, Fig. 3 displays the
about   same situation after the user deletes the first clause from the rule. This is equivalent to
 here   asking why not develop this alternative rule? As can be seen from the Counter
        Examples Window, the answer is because the alternative rule covers the two cases that
        are listed.

        A fifth example window also relates to the current rule. The Insufficient Information
        Examples Window lists any cases for which it cannot be determined whether or not the
        rule covers the example. This will occur when an example has unknown values for
        some of the attributes referred to in a rule, and satisfies all clauses relating to attributes
        for which it has values defined. This window alerts the user to potential problems
        arising from missing values in the data.

        The previous four example windows relate to the performance of a rule considered in
        isolation. The remaining two example windows relate to the performance of a set of
        rules as a whole. The Misclassified Examples Window lists cases that are misclassified
        by the rule set. Such misclassification may occur even if the example is correctly
        classified by one or more rules, if it is also misclassified by other rules. The manner in
        which such conflicts between rules are handled by the system is under user control.
        Rules may be ordered, so that the first rule encountered that covers a case is used.
        Alternatively, rules may be assigned values. Under this treatment, of the rules that
        cover a case, that with the highest value is employed. Rule values may be specified by
        the learning system or by the user. A final alternative is that in cases of multiple
        conflicting conclusions, the case remains unclassified.

        The Unclassified Examples Window lists cases that are not classified by the rule set.
        This may occur if multiple rules cover a case, as described above, or if no rule covers a
        case. The system may be set so that if no rule covers a case either the case is
        unclassified or a default classification is assigned.

        These two windows allow the user to obtain holistic evaluation of the rule set in
        addition to the rule specific evaluation provided by the majority of examples windows.

        The examples windows serve another function—example set validation. It is very easy
        for errors to enter into data. These may occur through errors in data entry or
        transmission or even due to imprecision in measuring instruments. In the traditional
        machine learning context, such errors will only rarely become apparent, as the
        knowledge bases that are inferred are not usually related back to the training
        examples. In contrast, The Knowledge Factory’s examples windows encourage the



                                                       8
user to closely inspect significant examples. In particular, the Insufficient Evidence,
        Counter, and Uncovered Examples Windows will often highlight cases that contain
        errors.

        Case-based rule critique

        The example windows enable the machine learning system to justify its actions to the
        user and to critique rules created by the user. The user is also able to critique rules
        formed by the learning system. In our experience it is common that experts will dislike
        a rule formed by the learning system but be unable to articulate appropriate
        modifications thereto. In such circumstances the expert will often be able to provide
        examples to support his or her disquiet. It is straightforward to add those examples to
        the training set. Such examples may be complete, including values for all attributes,
        and possibly derived from an actual case from the expert’s experience. Alternatively,
        they might be partial, with values specified only for some attributes. Such an example
        provides an abstract characterisation of types of case for which an alternative decision
        is appropriate.

        Case-based rule editing

        Another manner of using examples to modify rules is provided by case-based rule
        editing. Specification of counter-examples provides the machine learning system with
        information about conditions under which a rule should not apply. This may either
        cause the system to specialise a rule by adding more constraints to its application, or to
        select a different set of attributes with which to condition the application of the rule.
        To extend a rule to cover additional types of situation is less straightforward however.
        The addition of further positive examples to the training set may result in a rule being
        modified to cover those cases when learning is next applied. However, the system
        may equally cover those cases through modification of other rules for the class, or the
        generation of new rules. Also, depending on the metric used to evaluate the quality of
        rules, the machine learning system may tolerate a small number of counter-examples
        to a rule due to the quality metric evaluating them as not sufficient to warrant
        specialising the rule, thus defeating the user’s objective in specifying a counter-
        example.

        Case based rule editing has been developed to deal with such circumstances when an
        expert is unable to directly specify suitable changes to a rule, but can identify
        appropriate counter-examples or uncovered positive examples. With this technique,
        the expert identifies a rule and a case that the rule should or should not cover, and the
        system suggests modifications to the rule that would produce the desired outcome.
        The user can then evaluate and select between the proffered modifications or explore
        other forms of revision.

        When excluding a case from the cover of a rule, the system generates all least
Fig 4   specialisations of the rule with respect to the case. This is illustrated in Fig. 4. In this
about   example, we have removed a clause from a rule, which is highlighted in the Rules
 here   Window. This has caused it to incorrectly cover some counter examples, which are
        displayed in the Counter Examples Window. To explore how these counter examples
        might be excluded from the cover of the rule we have then applied the Contract to
        Exclude Case command, selecting the first case in the Counter Examples Window.
        This generates a large number of alternative rules, of which the last three are displayed
        in the Alternative Rules Window. Two of these add possible constraints on the



                                                      9
attribute age while the last adds a constraint on the attribute gender. The effect of these
new constraints is summarised on the summary line after each rule and could be
explored further using the examples windows. Further case based editing could be
applied to one of these alternatives until a satisfactory outcome was obtained, or the
user might reach a stage where he or she is able to complete the necessary
modifications directly.

When generalising a rule to cover a case, the system generates the least generalisation
of the rule with respect to the case. For the types of attribute-value rule supported,
there will only ever be a single least generalisation of a rule with respect to a single
case. If a user specifies a new case that a rule should cover, the Expand to Cover Case
command can be used to force the system to immediately generalise that rule to cover
that case. As with the Contract to Exclude Case command, several such case-based
rule editing actions may be required before a user is satisfied with the resulting rule.

Machine generated examples are not used

It is interesting to contrast the case-based communication mechanisms utilised by The
Knowledge Factory with the techniques pioneered in MARVIN (Sammut & Banerji,
1986). When testing hypotheses, MARVIN actively generates ‘examples’ which it asks
the expert to classify. Such a mechanism has been deliberately excluded from The
Knowledge Factory in the belief that, for many domains, without access to the types of
background knowledge that we have also deliberately excluded from the system,
examples created by the computer are likely to be nonsensical. In a medical domain,
for example, the interactions between and constraints on attributes will be many and
varied. Computer generated cases are likely to violate such constraints, for example,
by hypothesising pregnant male patients. The exploration of violation of such
constraints is unlikely to be useful, as the expert will know that the possibility does not
have to be excluded from a rule as it will never arise in practice.

Our case-based communication mechanisms exclusively use cases specified by the user
or imported from outside the system. In consequence, all cases should be meaningful
to the user. The computer may present cases to the user, but only cases selected from
those provided to it.

Exploration of Alternative Rules

Case-based rule editing is able to assist in some circumstances where an expert ‘knows’
that a rule is unsuitable, but cannot identify precisely how. But, it is limited to
situations in which the user can specify or identify appropriate example cases. This
will not always be possible. Alternative rules are another mechanism that The
Knowledge Factory provides for such situations.

Machine learning systems frequently make arbitrary choices between possible tests for
inclusion in a rule. There will often be several tests, all of which cover exactly the same
set of training cases. A machine learning system must select one of these, and usually
does so in an arbitrary manner. That such a choice has been made will not usually be
flagged to the user. However, while the choice may have been arbitrary given the
information available to the system, an expert may well be able to make comparative
qualitative judgements about the alternatives. This problem could be tackled by
involving the user in the process of selecting tests during induction. However, this
would impose a large burden on the user. There are very many such tests considered



                                             10
in typical learning contexts. Further, in many situations the user may not have any
        better basis on which to choose between alternatives than does the system.

        Instead, The Knowledge Factory offers the user a facility for identifying and selecting
        between alternative rules. This facility can be invoked by the user at any time at his
        own discretion. The Form Alternative Rules command creates a set of alternatives to
        the current rule. First, the system identifies all most general rules that cover all cases
        correctly covered by the current rule. Two more rules are added. The first is the most
        specific rule to cover all of the cases correctly covered by the initial rule. This provides
        for the user a complete presentation of all tests that could possibly be used to cover
        these cases. The second additional rule has as its antecedent the conjunction of all
        antecedents of all the most general rules that were formed. This summarises for the
        user the set of all tests that appear to be effective in distinguishing the current class
        from others. These rules are displayed in a window within which the user may
Fig 5   explore them using all of the system’s rule editing and evaluation facilities.
about
 here   Figure 5 illustrates the outcome of this command. In this example, rules have been
        developed from examples of diagnosis of hypothyroid disease (Quinlan, Compton,
        Horn, & Lazarus, 1986). The rule highlighted in the Ruleset Window is the original
        rule for which alternatives have been developed. The Alternative Rules Window lists
        the four alternatives that were discovered. The last two of these rules are the most
        general alternatives. The first of these is the original rule. The second is identical to
        the first except that the clause psych is f is replaced by sick is f. All 43 cases covered by
        the original rule satisfy both of these conditions. One or the other of these clauses is
        needed if a rule is to cover all 43 cases correctly classified by the original rule and no
        cases of another class. For this rule there happens to be one case of class
        compensated_hypothyroid that satisfies all of the other conditions but, unlike any of the
        positive cases, has values of t for both psych and sick. This is a clear illustration of a
        situation in which system has no sensible basis on which to choose between
        alternatives but an expert may. The first of the alternative rules is the most specific
        rule to cover all 43 cases. The second alternative rule contains all clauses in either of
        the most general alternatives.

Fig 6   Note that this is a very simple example of alternative rules, deliberately chosen for ease
about   of presentation and explanation. For contrast, Fig. 6 presents some of the 39 alternative
 here   most general rules all of which cover the only two cases of class secondary_hypothroid
        in the training set and none of which cover any cases of another class.

        There is an interesting relationship between this alternative rules technique and the
        example generation technique of MARVIN (Sammut & Banerji, 1986). In both
        situations the system wishes to obtain advice from an expert to help it distinguish
        between alternatives that are equally well supported by the training cases. However,
        whereas MARVIN generates hypothetical cases in an attempt to distinguish between
        the alternatives, we present the alternatives directly. As argued above, automated
        generation of hypothetical cases will often be unsuccessful in real world domains, if
        only because such cases are unlikely to appear realistic to an expert. We believe that
        complete rules of the type that The Knowledge Factory creates will often be easier for
        the expert to comprehend and manipulate.




                                                      11
Interactive Domain Model Revision

Buntine & Stirling (1991) argue that effective real world application of machine
learning frequently requires a cyclical process in which the results of machine learning
are provided to the expert for review which may lead to the identification of relevant
information not originally provided to the machine learning system, which leads back
to another phase of application of machine learning with revised inputs. One of the
design objectives for The Knowledge Factory is to allow the expert to complete this
cycle within the system. With many machine learning systems, each time through the
cycle, machine learning must start afresh. I have already described how the system
can iterate through such a cycle when the new information is presented in the form of
new or revised example cases. Greater change is required, however, if the expert
realises that there are deficiencies in the set of attributes selected with which to
describe the examples. We have observed this circumstance in practice. The expert,
when confronted with a rule developed by the system, and the evidence it can provide
in the rule’s support, realises that a factor not included in the current case descriptions
is necessary to adequately revise the rule. Another type of change is required when it
is realised that there is a need to provide finer grade distinctions for an existing
attribute (one or more current value needs to be replaced by a number of values) or
some other transformation of an attribute is needed (conversion from ordinal to
categorical or vice versa, for example). The Knowledge Factory provides facilities for
performing such conversions. Existing rules and cases that are affected by the change
are automatically transformed, where possible, so as to retain as much as possible of
the existing knowledge base.

When a new attribute is created, all cases are initialised with the value for this attribute
set to unknown. When the attribute is only relevant in a small number of contexts, the
user need only specify values for a small number of cases, for example, the positive
and counter examples for the current rule.

Direct Expert Evaluation of Rules

While cased based communication is able to accommodate many of the interactions
between an expert and a machine learning system, there are some communication
needs that it cannot cover. One of these is the direct communication of general
qualitative rule evaluation by the expert. Case-based communication allows the expert
to inform the system about specific deficiencies of a rule. It does not allow the expert
to specify that the current form of a rule is very good, or that it is unacceptable without
specifying how.

The Knowledge Factory allows the expert to attach qualitative labels to individual
rules. The three labels currently supported are no good, revisable, and good.

The Knowledge Factory takes account of these assessments during induction. Rules
that are rated no good are discarded before rule refinement commences. This usually
results in the formation of totally new rules, although in some circumstances the
induction system may derive the same rule again. Techniques for avoiding this
outcome would be desirable, but are difficult to incorporate into traditional machine
learning systems.

Rules that are rated good are not altered during rule refinement, but rather are passed
directly into the final rule set.



                                             12
Revisable rules are retained, but may be altered to reduce the number of negative cases
        or increase the number of positive cases covered. This is the default annotation.

        These rule annotations are simple for the user to utilise, but provide useful guidance to
        the rule refinement process.

        Summary Ruleset Evaluation

        Most machine learning systems provide facilities for summarising the performance of
        inferred rules when applied to an evaluation set of cases. The primary metric used in
        such evaluation is usually accuracy. But in real world applications, this will not
        always be the most important measure of performance. In many domains, some errors
        are more significant than others (for example, failure to diagnose a serious disease in
        contrast to unnecessary but comparatively harmless treatment arising from diagnosing
        a disease that is not present). We have sought to develop a simple manner in which to
        capture all of the relevant summary information that is likely to be useful for a user.

Fig 7   Figure 7 presents an example of the tabular format for presenting evaluation results
about   that we have created. As there are four classes in this example (using the hypothyroid
 here   diagnosis data previously mentioned), there is a 4x4 matrix which presents the number
        of cases belonging to each class that have been given each classification. The first four
        rows of the table relate to the class assigned to a case by the current rule set while the
        first four columns relate to the correct class for a case. The fifth column relates to all
        cases. The sixth column presents for each class, the percentage of those cases assigned
        that class by the rule set for which that classification was correct. The fifth row
        presents the total number of cases of each class for which any classification was
        assigned by the rules. The sixth row summarises the percentage of those that were
        correct. The seventh row indicates the number of cases for each class for which no
        decision was made. The eighth row shows the proportion of all cases for the class for
        which a classification was assigned. The ninth row displays the sensitivity of the rule
        set for the class. This is the proportion of cases belonging to the class that were
        correctly classified. The final row displays the specificity for the class. This is the
        proportion of cases not belonging to the class that were not incorrectly classified as
        belonging to the class.

        EXPERIMENTAL EVALUATION

        A major difficulty facing research in knowledge acquisition is evaluation of alternative
        techniques. Real world knowledge acquisition tasks usually require large and very
        expensive projects. It is not feasible to conduct controlled experiments in which
        different methodologies are each applied to the same full scale knowledge acquisition
        project under real-world conditions. As a result, other than case studies, there has
        been little experimental investigation of the performance of different techniques. Even
        the ground-breaking Sisyphus projects (Linster, 1992; Schreiber & Birmingham, 1996),
        in which many different systems were all applied to common knowledge acquisition
        tasks, is perhaps best thought of as a set of parallel case studies, as it was not possible
        to control extraneous factors that may have affected a participant’s use of a system,
        and hence comparisons are restricted.

        A case study can demonstrate the ability of a technique to achieve an outcome.
        Beyond this, they provide little comparative information about alternatives.




                                                     13
We have been concerned to perform experimental evaluation of the efficacy of the
techniques embodied in The Knowledge Factory. We were encouraged by the positive
evaluations provided by participants in case studies (Webb, 1996).

We wanted to have large numbers of subjects apply differing approaches to a
knowledge acquisition task under controlled conditions. One way to achieve this was
to set appropriate assignments for students in an undergraduate course on Artificial
Intelligence and Expert Systems, and to study their performance. Subjects could be
provided software with different capabilities and their performance measured on
common tasks under controlled conditions.

An initial attempt at such evaluation (Webb & Wells, 1996) was undermined by a user
interface flaw. Many subjects seeking to use The Knowledge Factory’s machine
learning facilities to refine rules that they had created unintentionally employed
machine learning in a mode that deleted their rules and replaced them by rules learned
by the machine learning system independently from the initial rules. In consequence,
while on average outperforming both knowledge acquisition without access to
machine learning facilities and the application of machine learning alone, the
integrated application of machine learning with knowledge acquisition from experts
failed to demonstrate a statistically significant advantage over machine learning alone.
When the subjects that applied machine learning in the mode that overwrote existing
rules were excluded from consideration, the advantage over machine learning was
statistically significant, but the selection of subjects for analysis in this manner cast
doubt upon the results.

This user interface fault was rectified, and a second such experiment undertaken
(Webb, Wells, & Zheng, 1999).         In this second experiment, we again used
undergraduate students performing a class assignment. This assignment was
conducted under supervised laboratory conditions. Knowledge acquisition tasks were
created using the Glass and Soybean Large data sets from the UCI Repository of
Machine Learning Databases (Merz & Murphy, 1997). The Glass data relates to
forensic analysis of glass fragments and the Soybean Large data pertains to the
diagnosis of soybean plant diseases.

Subjects were given randomly selected training sets of examples. The accuracy of their
expert systems could then be tested by evaluating their performance on withheld
evaluation sets. To simulate use by experts, the subjects were provided with rules for
these tasks learned by C4.5rules (Quinlan, 1993) from subsets of the data. As these
subsets included both training and evaluation cases provided to the subjects, the rules
could be expected to capture information about the domains that subjects could not
glean from examination of their training data either with or without the use of machine
learning tools.

Two versions of The Knowledge Factory were created. A number of facilities were
disabled that could not be usefully employed in the context of the study. This was to
minimise the chance of a repetition of the previous outcome where misapplication of
the software invalidated the study. The facilities disabled included changing the
manner in which rules were interpreted during execution of a rule set, saving and
loading sets of examples from disk, and the selection, maintenance and utilisation of
separate training and evaluation sets. All of the key knowledge acquisition facilities
were retained, except that machine learning capabilities were disabled in one version,
called KA-alone.


                                            14
Subjects were randomly divided into two groups. One group used KA-alone for the
          Glass data and the full system for Soybean Large, and the other group used the
          systems in the other order.

          Seventeen subjects participated in the experiment. Table 1 presents the mean
Table 1
 about    predictive accuracy on the evaluation cases obtained using each system for each data
  here    set, along with that of The Knowledge Factory’s machine learning system when
          applied using the training and evaluation cases used by subjects using the full system.
          For both data sets the integrated use of machine learning with knowledge acquisition
          from experts led to higher predictive accuracy than the use of either of its constituent
          approaches in isolation. With respect to KA-alone, both of these differences were
          statistically significant at the 0.05 level (one-tailed two-sample t tests; Soybean
          Large:t=2.1, p=0.026; Glass: t=3.35, p=0.001). With respect to learning-alone, the
          difference was significant at the 0.05 level for the Glass data (one-tailed matched-pairs
          t test: t=2.5, p=0.021) but the other was not (one-tailed matched-pairs t test: t=0.9,
          p=0.279). Note that the power of these tests was low due to the small number of
          subjects and hence that the failure to find a significant difference provides only very
          weak evidence that such a difference does not actually exist. The significant
          differences that were found demonstrate that for at least some knowledge acquisition
          tasks, the integrated use of machine learning with knowledge acquisition from experts
          can produce more accurate expert systems than either constituent method in isolation.

          The integrated approach was also faster than KA-alone. For Soybean Large the mean
          and standard deviation for the integrated approach was 73±45 minutes while for KA-
          alone it was 131±19. For Glass these figures were respectively 16±19 and 115±38. One-
          tailed two-sample t tests reveal that both of these differences are significant at the 0.05
          level (Soybean Large: t=3.3, p=0.002; Glass: t=1.9, p=0.037).     In both cases the time
          savings from the integrated use of machine learning with knowledge acquisition were
          substantial. Of course, they were both eclipsed in this respect by the application of
          machine learning alone, for which knowledge acquisition time can be measured in
          seconds rather than minutes.

          Questionnaire responses from participants indicated that the subjects believed the
          machine learning facilities were useful, found the knowledge acquisition process easier
          when machine learning facilities were available, and had greater confidence in the
          expert systems developed with the aid of machine learning.

          These results, presented and analysed in more detail by Webb, Wells, & Zheng (1999),
          all provide support for the efficacy of the approaches that we have developed. At least
          for some knowledge acquisition tasks they can lead to the more rapid development of
          more accurate expert systems in which the users have greater confidence than does
          knowledge acquisition without machine learning, and to more accurate expert systems
          than machine learning in isolation.

          CONCLUSIONS

          The Knowledge Factory is a knowledge acquisition system that is designed for direct
          use by domain experts. It differs from most previous systems intended for this use by
          incorporating machine learning facilities. In consequence, it is not necessary for the
          expert to provide solutions for every contingency that the knowledge base must cover.
          The machine learning system can fill gaps in the expert’s expertise. Both the machine




                                                       15
learning system and the expert can critique and propose refinements to the other’s
rules.

One of the distinctive features of The Knowledge Factory is the manner in which
communication with the user has been structured around the use of example cases.
Techniques have been developed that allow both the user and the system to convey
non-trivial information through this mechanism. Interestingly, however, we have
concluded that one such form of communication that has received previous use, the
construction of hypothetical cases by the system in order to test hypotheses, is actually
inappropriate in our context. Instead, we favour explicit presentation of the competing
hypotheses in this situation.

The Knowledge Factory is distinguished from previous knowledge acquisition systems
by the manner in which it supports experts with minimal computing expertise to
directly interact with a machine learning system during all phases of knowledge
acquisition. Case studies have found that such users have little difficulty in using the
system and controlled studies suggest that the integration of machine learning with
knowledge acquisition within the system can outperform either constituent approach
in isolation.

REFERENCES

Agar, J., & Webb, G. (1992) The application of machine learning to a renal biopsy
data-base. Nephrology, Dialysis and Transplantation, 7: 472-478.

Attar Software (1989). Structured decision tasks methodology for developing and integrating
knowledge base systems. Attar Software, Leigh, Lancashire.

Boose, J. H. (1986). ETS: A system for the transfer of human expertise. In J. S. Kowalik
(Ed.), Knowledge based problem solving. New York: Prentice-Hall.

Buntine, W., & Stirling, D. (1991). Interactive induction. In J. E. Hayes, D. Michie, & E.
Tyugu (Eds.) Machine Intelligence 12. Oxford: Clarendon Press, pp. 121-137.

Compton, P., Edwards, G., Srinivasan, A., Malor, R., Preston, P., Kang, B., & Lazarus,
L. (1992). Ripple down rules: Turning knowledge acquisition into knowledge
maintenance. Artificial Intelligence in Medicine, 4: 47–59.

Davis, R., & Lenat, D. B. (1982). Knowledge-based systems in artificial intelligence. New
York: McGraw-Hill.

De Raedt, L. (1992). Interactive theory revision. London: Academic Press.

Kodratoff, Y., & Vrain, C. (1993) Acquiring first-order knowledge about air traffic
control. Knowledge Acquisition, 5, 1–6.

Linster, M. (1992) A review of Sisyphus 91 and 92: Models of Problem-Solving
Knowledge. In N. Aussenac, G. Boy, B. Gaines, M. Linser, J.-G. Ganascia, &
Y. Kordratoff (Eds) Knowledge Acquisition for Knowledge-Based Systems. Berlin: Springer-
Verlag, pp. 159-182.

Merz, C. J. & Murphy, P. M. (1997) UCI Repository of Machine Learning Databases
[Machine-readable data repository]. University of California.


                                             16
Michalski, R. S. (1983). A theory and methodology of inductive learning. In R. S.
Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.) Machine learning: An Artificial
Intelligence Approach. Berlin: Springer-Verlag.

Morik, K., Wrobel, S., Kietz, J.-U., & Emde, W. (1993). Knowledge Acquisition and
Machine Learning: Theory, Methods, and Applications. London: Academic Press.

Nedellec, C., & Causse, K. (1992). Knowledge refinement using knowledge acquisition
and machine learning methods. Proceedings EKAW'92. Berlin: Springer-Verlag, pp.
171-190.

O'Neil, J. L., & Pearson, R. A. (1987). A development environment for inductive
learning systems. In Proceedings of the 1987 Australian Joint Artificial Intelligence
Conference. Sydney, pp. 673-680.

Plotkin, Gordon D. (1970) A note on inductive generalisation. In B. Meltzer & D.
Mitchie (Eds) Machine Intelligence 5. Edinburgh University Press, Edinburgh, pp. 153-
163.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan
Kaufmann.

Quinlan, J. R., Compton, P., Horn, K. A., & Lazarus, L. (1986) Inductive Knowledge
Acquisition: A Case Study, New South Wales Institute of Technology School of
Computing Sciences, Technical Report 86.4, Sydney.

Sammut, C. & Banerji, R. B.(1986) Learning concepts by asking questions. In Michalski,
Ryszard S., Carbonell, Jaime G., & Mitchell, Tom M. (Eds) Machine Learning: An
Artificial Intelligence Approach Volume II. Morgan Kaufmann, Los Altos, pp. 167-191.

Schmalhofer, F., & Tschaitschian, B. (1995). Cooperative knowledge evolution for
complex domains. In G. Tecuci, & Y. Kodratoff (Eds.) Machine learning and knowledge
acquisition: Integrated approaches. London: Academic Press.

Schreiber, A. & Birmingham , W. P. (1996) The Sisyphus-VT initiative. International
Journal of Human-Computer Studies. 44.

Shapiro, A. (1987) Structured Induction in Expert Systems. Addison-Wesley, London.

Smith, R. G., Winston, H. A., Mitchell, T. M., & Buchanan, B. G. (1985). Representation
and use of explicit justifications for knowledge base refinement. In Proceedings of the
Ninth International Joint Conference on Artificial Intelligence. San Mateo, Ca: Morgan
Kaufmann, pp. 673-680.

Tecuci, G., & Kodratoff, Y. (1990). Apprenticeship learning in imperfect domain
theories. In Y. Kodratoff, & R. Michalski (Eds.) Machine learning: An Artificial
Intelligence Approach. San Mateo, CA: Morgan Kaufmann.

Webb, G. I. (1993). DLGref2: Techniques for inductive knowledge refinement.         In
Proceedings of the IJCAI Workshop W16. Chambery, France, pp. 236-252.

Webb, G. I. (1996). Integrating machine learning with knowledge acquisition through
direct interaction with domain experts. Knowledge-Based Systems, 9: 253-266.


                                           17
Webb, G. I. & Wells, J. (1996) Experimental evaluation of integrating machine learning
with knowledge acquisition through direct interaction with domain experts. In P.
Compton, R. Mizoguchi, H. Motada, & T. Menzies (Eds) Proceedings of PKAW'96: The
Pacific Knowledge Acquisition Workshop. Sydney, pp. 170-189.

Webb, G. I., Wells, J., & Zheng, Z. (1999) An experimental evaluation of integrating
machine learning with knowledge acquisition. Machine Learning, 35(1): 5-24.

Wilkins, D. C. (1988). Knowledge base refinement using apprenticeship learning
techniques.    In Proceedings of the Seventh National Conference on Artificial
Intelligence. San Mateo, CA: Morgan Kaufmann, pp. 646-651.




                                          18
APPENDIX A

THE DLGREF ALGORITHM

α is the most specific possible rule for the class positive, that covers no cases. The least
generalisation of α against a case results in the most specific rule that covers that case.

ζ is the most general possible rule for the class positive, that covers all cases.

Function DLGref
Parameters: rules: an initial set of rules for a single class
                POS: a set of cases belonging to that class
                NEG: a set of cases that do not belong to the class
                value(): a function from rules to numeric values such that the higher the
                value the greater the preference for the rule.
Returns:        rules: a revised set of rules for the class
    for r is set to each rule in rules in succession
           if r covers any cases in NEG
                  spec_rule <- generalise_rule(α, covered_cases(r, POS), NEG, value)
                  if spec_rule covers any cases in POS
                        r <- generalise_toward(spec_rule, r, NEG)
                  end if
           end if
           remove from POS all cases that r covers
    end for
    for r is set to each rule in rules in succession
           r <- generalise_rule(r, POS, NEG, value)
           remove from POS all cases that r covers
    end for
    while POS is not empty
           r <- generalise_rule(α, POS, NEG, value)
           if r covers any cases in POS
                  r <- generalise_toward(r, ζ, NEG)
                  remove from POS all cases the r covers
                  add r to rules
           else
                  remove all remaining cases from POS
           end if
    end while




                                              19
Function generalise_rule
Parameters: rule an initial rule for the positive class
               POS: a set of cases belonging to that class
               NEG: a set of cases that do not belong to the class
               value(): a function from rules to numeric values such that the higher the
               value the greater the preference for the rule.
Returns:       result: a generalisation of rule
    result <- rule
    for c is set to each case in POS in succession
          set r to the least generalisation of result with respect to c
          if value(r, POS, NEG) > value(result, POS, NEG)
                 result <- r
          end if
    end for


Function generalise_toward
Parameters: spec an initial specific rule for the positive class
              gen: a generalisation of spec
              NEG: a set of cases that do not belong to the class
Returns:      result: a rule that is a generalisation of spec and a specialisation of gen and
              which covers no cases in NEG not covered by spec
    while spec ≠ gen
          for each clause c in the antecedent of spec
                if deleting c from spec would increase the number of cases in
                NEG covered by spec
                       add c to gen
                end if
          end for
          for each clause c in the antecedent of spec
                if c is not in the condition of gen and adding c to the condition of
                gen does not decrease the number of cases in NEG covered by gen
                       remove c from spec
                end if
          end for
    end while
    result <- spec

DLGref makes two passes through the initial rules. In the first pass, any rules that
cover negative cases are specialised, if possible, so as to no longer cover those cases.
All positive cases covered by the resulting rules are then deleted as they do not need
further attention. In the second pass through the initial rules, they are generalised as
much as possible to cover further positive cases. Any cases so covered are also deleted.
Finally, new rules are added to the rule set to cover any remaining positive cases.

The function generalise_rule seeks to generalise an initial rule to cover as many as
possible positive cases, but without unduly increasing negative cover. The value



                                              20
function is used to evaluate whether an increase in negative cover, if it occurs, is
sufficiently offset by an increase in positive cover.

The function generalise_toward generalises an initial specific rule toward a more
general form as far as possible without increasing the negative cover of the initial rule.




                                            21
Table 1: Mean and standard deviation for predictive accuracy.



        Data set           Integrated           KA alone          Learning alone

Soybean Large               88.5±4.4             84.4±3.6             87.8±1.6

Glass                       81.3±7.7            59.0±17.3             79.2±8.0




                                           22
23
FIGURES




FIGURE 1:    Examples of rules in The
Knowledge Factory.




                  24
25
FIGURE 2: A selected rule with displayed positive, counter and uncovered examples.
FIGURE 3: Answering a why not question.   26
FIGURE 4: Case based editing: Rules generated by Contract to Exclude Case applied to the first case in
the Counter Examples Window.

                                                  27
FIGURE 5: Alternative rules that cover the same 43 cases of class primary_hypothyroid.




                                                     28
FIGURE 6: A selection of the 39 alternative most general rules that cover both cases of class
secondary_hypothyroid.




                                              29
FIGURE 7: The results window.




                                30

More Related Content

What's hot

SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...ijaia
 
Verification and validation of knowledge bases using test cases generated by ...
Verification and validation of knowledge bases using test cases generated by ...Verification and validation of knowledge bases using test cases generated by ...
Verification and validation of knowledge bases using test cases generated by ...Waqas Tariq
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers ijcsa
 
Fuzzy rule based expert system for diagnosis of lung cancer
Fuzzy rule based expert system for diagnosis of lung cancerFuzzy rule based expert system for diagnosis of lung cancer
Fuzzy rule based expert system for diagnosis of lung cancerFarzad Vasheghani Farahani
 
Big shadow test
Big shadow testBig shadow test
Big shadow testmunsif123
 
Scenario $4$
Scenario $4$Scenario $4$
Scenario $4$Jason121
 
Test equating using irt. final
Test equating using irt. finalTest equating using irt. final
Test equating using irt. finalmunsif123
 
Introduction to Expert Systems {Artificial Intelligence}
Introduction to Expert Systems {Artificial Intelligence}Introduction to Expert Systems {Artificial Intelligence}
Introduction to Expert Systems {Artificial Intelligence}FellowBuddy.com
 
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...AI Publications
 
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE cscpconf
 
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMCOMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
 
A03203001004
A03203001004A03203001004
A03203001004theijes
 
Test construction
Test constructionTest construction
Test constructionmunsif123
 
QPLC: A Novel Multimodal Biometric Score Fusion Method
QPLC: A Novel Multimodal Biometric Score Fusion MethodQPLC: A Novel Multimodal Biometric Score Fusion Method
QPLC: A Novel Multimodal Biometric Score Fusion MethodCSCJournals
 
A study of multimodal biometric system
A study of multimodal biometric systemA study of multimodal biometric system
A study of multimodal biometric systemeSAT Publishing House
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataWeCloudData
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
 
Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...Journal Papers
 

What's hot (20)

SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
 
Verification and validation of knowledge bases using test cases generated by ...
Verification and validation of knowledge bases using test cases generated by ...Verification and validation of knowledge bases using test cases generated by ...
Verification and validation of knowledge bases using test cases generated by ...
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers
 
Fuzzy rule based expert system for diagnosis of lung cancer
Fuzzy rule based expert system for diagnosis of lung cancerFuzzy rule based expert system for diagnosis of lung cancer
Fuzzy rule based expert system for diagnosis of lung cancer
 
Big shadow test
Big shadow testBig shadow test
Big shadow test
 
Scenario $4$
Scenario $4$Scenario $4$
Scenario $4$
 
Test equating using irt. final
Test equating using irt. finalTest equating using irt. final
Test equating using irt. final
 
Introduction to Expert Systems {Artificial Intelligence}
Introduction to Expert Systems {Artificial Intelligence}Introduction to Expert Systems {Artificial Intelligence}
Introduction to Expert Systems {Artificial Intelligence}
 
Ijsea04031006
Ijsea04031006Ijsea04031006
Ijsea04031006
 
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
 
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT’S ACADEMIC PERFORMANCE
 
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMCOMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
 
A03203001004
A03203001004A03203001004
A03203001004
 
Test construction
Test constructionTest construction
Test construction
 
QPLC: A Novel Multimodal Biometric Score Fusion Method
QPLC: A Novel Multimodal Biometric Score Fusion MethodQPLC: A Novel Multimodal Biometric Score Fusion Method
QPLC: A Novel Multimodal Biometric Score Fusion Method
 
Introduction to knowledge discovery
Introduction to knowledge discoveryIntroduction to knowledge discovery
Introduction to knowledge discovery
 
A study of multimodal biometric system
A study of multimodal biometric systemA study of multimodal biometric system
A study of multimodal biometric system
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
 
Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...
 

Viewers also liked

Advanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.docAdvanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.docbutest
 
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...butest
 
BenMartine.doc
BenMartine.docBenMartine.doc
BenMartine.docbutest
 
T2L3.doc
T2L3.docT2L3.doc
T2L3.docbutest
 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEEbutest
 
Teaching Machine Learning to Design Students
Teaching Machine Learning to Design StudentsTeaching Machine Learning to Design Students
Teaching Machine Learning to Design Studentsbutest
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...butest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 

Viewers also liked (9)

Advanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.docAdvanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.doc
 
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
 
BenMartine.doc
BenMartine.docBenMartine.doc
BenMartine.doc
 
T2L3.doc
T2L3.docT2L3.doc
T2L3.doc
 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEE
 
Teaching Machine Learning to Design Students
Teaching Machine Learning to Design StudentsTeaching Machine Learning to Design Students
Teaching Machine Learning to Design Students
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 

Similar to Techniques for integrating machine learning with knowledge ...

Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesmuhammad afif
 
Specification based or black box techniques
Specification based or black box techniques Specification based or black box techniques
Specification based or black box techniques Muhammad Ibnu Wardana
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesAji Pamungkas Prasetio
 
Specification Based or Black Box Techniques
Specification Based or Black Box TechniquesSpecification Based or Black Box Techniques
Specification Based or Black Box TechniquesNadia Chairunissa
 
Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Andika Mardanu
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesM Branikno Ramadhan
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesM Branikno Ramadhan
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesIrvan Febry
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesYoga Setiawan
 
Specification based or black box techniques 3
Specification based or black box techniques 3Specification based or black box techniques 3
Specification based or black box techniques 3alex swandi
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesDinul
 
Specification Based or Black Box Techniques
Specification Based or Black Box TechniquesSpecification Based or Black Box Techniques
Specification Based or Black Box TechniquesRakhesLeoPutra
 
machine learning
machine learningmachine learning
machine learningMounisha A
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesM HiDayat
 
Specification based or black box techniques 3
Specification based or black box techniques 3Specification based or black box techniques 3
Specification based or black box techniques 3Bima Alvamiko
 
Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...butest
 
A review robot fault diagnosis part ii qualitative models and search strategi...
A review robot fault diagnosis part ii qualitative models and search strategi...A review robot fault diagnosis part ii qualitative models and search strategi...
A review robot fault diagnosis part ii qualitative models and search strategi...Siva Samy
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesYoga Pratama Putra
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 

Similar to Techniques for integrating machine learning with knowledge ... (20)

Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques
Specification based or black box techniques Specification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification Based or Black Box Techniques
Specification Based or Black Box TechniquesSpecification Based or Black Box Techniques
Specification Based or Black Box Techniques
 
Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques 3
Specification based or black box techniques 3Specification based or black box techniques 3
Specification based or black box techniques 3
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification Based or Black Box Techniques
Specification Based or Black Box TechniquesSpecification Based or Black Box Techniques
Specification Based or Black Box Techniques
 
machine learning
machine learningmachine learning
machine learning
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques 3
Specification based or black box techniques 3Specification based or black box techniques 3
Specification based or black box techniques 3
 
Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...
 
A review robot fault diagnosis part ii qualitative models and search strategi...
A review robot fault diagnosis part ii qualitative models and search strategi...A review robot fault diagnosis part ii qualitative models and search strategi...
A review robot fault diagnosis part ii qualitative models and search strategi...
 
Ieee doctoral progarm final
Ieee doctoral progarm finalIeee doctoral progarm final
Ieee doctoral progarm final
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 

More from butest

LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 

More from butest (20)

LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 

Techniques for integrating machine learning with knowledge ...

  • 1. Prepublication draft of Webb, G. I. (2002). Integrating Machine Learning with Knowledge Acquisition. In Leondes, C. T. (Ed.), Expert Systems, volume 3, pages 937- 959. San Diego, CA: Academic Press. Techniques for integrating machine learning with knowledge acquisition Geoffrey I. Webb School of Computing and Mathematics Deakin University Geelong, Vic, 3217, Australia Tel: +61 3 5227 2606 Fax: +61 3 5227 2028 Email: webb@deakin.edu.au INTRODUCTION Knowledge acquisition is frequently cited as the greatest bottleneck in the development of expert systems. Two primary approaches to knowledge acquisition are elicitation of knowledge from experts (traditional knowledge acquisition) and machine learning. Both approaches have strengths and weaknesses. Experts can draw upon practical experience and both domain specific and general knowledge. In addition, they often have access to well established procedures relating to the target domain. Elicitation of knowledge from experts is limited, however, by the difficulty of articulating knowledge; reluctance in some circumstances to share knowledge; preconceptions and biases that might inappropriately influence an expert; and the limits of an expert’s knowledge. In contrast, machine learning systems provide a capacity to infer procedures from examples; can perform extensive logical analysis; and are not subject to the same types of preconceptions and biases as an expert. They are hampered, however, by limited access to general and domain-specific knowledge and the difficulties of obtaining comprehensive example sets. Further, machine learning is only possible where much of the knowledge acquisition task has already been completed. Machine learning requires a description of the problem domain and collection of example cases from which to learn. In attribute-value machine learning, the domain description consists of a set of attributes and their allowable values along with a collection of class values. These, together with the formalism used for expressing the decision procedures, specify a space of possible solutions that the system might ‘learn’. The learning system then explores this space of solutions seeking one that best fits the training examples. If a suitable space of possible solutions is specified, learning is relatively straightforward. If a poor solution space is specified, effective learning is impossible. Thus, machine learning requires prior ontological analysis and the specification of a suitable class of models to explore. 1
  • 2. It is clear that this pre-learning process is crucial. A domain expert will usually provide critical input into this process. This is not the end of the domain expert’s involvement in knowledge acquisition by machine learning, however. In practical applications of machine learning it is often necessary for an expert to review the rules that a machine learning system creates (see, for example, Buntine & Stirling, 1991). These rules may require modification due to pragmatic or other considerations quite outside the formal factors that a machine learning system is able to consider. For example, social or ethical considerations may make unacceptable a policy to refuse loans to a specific class of loan applicant, no matter how compelling the evidence is that the particular class of applicant represents a poor risk. Review by the expert will also frequently lead to the identification of deficiencies in the domain specification, which will necessitate a cycle involving modification of the domain specification and re-application of machine learning, followed by another domain expert review. Most machine learning tools provide little support for these interactions with the expert. In addition to assisting the expert in processes that must be carried out to support machine learning, there is also a need for facilities to enable the expert to directly assist the machine learning system during the induction process. However, there are a number of obstacles to providing such support. The first of these is communication. Any sophisticated interaction between an expert and a machine learning system will require the two to communicate with one another. But, such communication requires mechanisms beyond those normally supplied by machine learning systems. A second difficulty is that the machine learning system usually provides a very shallow knowledge-base. Without any means of obtaining explanations and justifications of the rules that are presented, an expert can find these shallow rules difficult to assimilate into his or her more sophisticated models of the domain. Interactions are further limited by the restricted forms of input to which most machine learning systems are restricted. Most attribute-value systems allow only the description of the attributes and of a set of examples. There is no provision for background knowledge or any other suggestions or constraints with which to guide the learning process. A final obstacle is that machine learning systems can make quite arbitrary choices in situations where there are alternative solutions that equally well fit the example cases. Such situations are quite frequent in practice. These arbitrary choices may be acceptable in such situations if no other information is available and maximisation of predictive accuracy is the primary consideration, as is usually considered the case in machine learning research. However, an expert will often be able to discriminate between the alternatives on other grounds. In this context, the systems’ arbitrary choices may reduce the comprehensibility or acceptability of the inferred rules for the expert or may lead to less useful rules than if the expert’s judgements are utilised. A large number of systems and techniques have been designed to tackle aspects of interaction between a user and a machine learning system. Most of these are oriented toward interactive use by a sophisticated knowledge engineer (Attar Software, 1989; Davis & Lenat, 1982; De Raedt, 1992; Morik, Wrobel, Kietz & Emde, 1993; Nedellec & Causse, 1992; O'Neil & Pearson, 1987; Schmalhofer & Tschaitschian, 1995; Shapiro, 2
  • 3. 1987; Smith, Winston, Mitchell & Buchanan, 1985; Tecuci & Kodratoff, 1990; Wilkins, 1988). This paper describes The Knowledge Factory, a computer-based environment that allows a domain expert to directly collaborate with a machine learning system at all stages of the knowledge acquisition process. We have endeavoured to take the above considerations into account in designing this system. It is distinguished from previous such systems by its orientation toward direct use by domain experts with minimal computing sophistication. THE KNOWLEDGE REPRESENTATION SCHEME Sophisticated knowledge representation schemes, such as first-order logic, are not appropriate for use by non-knowledge-engineers (see, for example, Kodratoff & Vrain, 1993). Extensive training and experience is required to master such formalisms. Instead, The Knowledge Factory uses simple attribute-value case descriptions and production rules. Example cases are described by simple vectors of attribute values. Rule antecedents (conditions) are conjunctions of simple tests on attributes. The consequents (conclusions) are simple classification statements. Fig 1 Examples of rules are provided in Fig. 1. These examples illustrate rules learned from about data on the diagnosis of renal disease used in research with Geelong Hospital (Agar & here Webb, 1992). Attributes are either categorical or ordinal. For categorical attributes, set membership tests are used (although set notation is avoided). All tests in the first rule in Fig. 1 relate to categorical attributes. For ordinal attributes, tests specify a range of allowable values. The attributes age and prodocytic_fusion_ef tested in the second rule of Fig. 1 are both ordinal. The condition part of a rule (the tests between the keywords IF and THEN) is satisfied for a case if all of the individual tests are satisfied. If the condition is satisfied then the conclusion for the rule is taken to apply to that case. One complication that must be handled by a machine learning system is missing values in the data. The Knowledge Factory recognises two types of missing value, unknown and unobtainable. The latter is treated as a distinct value, that may be tested in a rule, unobtainable having the same status as a normal value. The first test in the third rule in Fig. 1 allows unobtainable values for the attribute age. Any case with an unobtainable value for age will fail the condition for the second rule, which does not explicitly allow unobtainable values for this attribute. Unobtainable values should be used where the absence of a value needs to be considered during the decision process, for example, because different attributes should be considered if an attribute normally used in the decision making process does not have a value. In contrast, unknown values pass any test on an attribute. In consequence, the test for the second rule in Fig. 1 will succeed for a case with an unknown value for age, so long as all of the tests on other attributes succeed. The condition on hepatitus_B_antibody in the second rule of Fig. 1 will only succeed for a case with an unknown value. A summary line follows the consequent of each rule. This displays the number of training examples that the rule correctly classifies (Positive), the number incorrectly classified (Negative) and the value currently assigned to the rule by the system (used, under some settings, when resolving between multiple rules that cover a case). In the current version of the system, the rule sets are flat. All rules consequents relate to the same attribute, called the decision attribute. This attribute cannot be used in the 3
  • 4. antecedent of a rule. While it might be appropriate to slightly relax these constraints, it would not be appropriate to allow complex chaining to any great depth, as non- knowledge-engineers are unlikely to readily master the complexities of such a rule base. Attempts to incorporate more sophisticated knowledge representation devices, such as model construction operators (Morik, Wrobel, Kietz, & Emde, 1993), are also viewed as likely to impose an undesirable barrier to use for those without extensive training in knowledge engineering. We have conducted informal experimentation with two sets of experts—medical practitioners and financial analysts. The former have had little computing expertise and the latter have had substantial computing expertise but no background in knowledge-engineering. These experiments show that these experts adapt readily to this form of knowledge representation. They have no difficulty in interpreting existing rules, or specifying new rules or examples. MACHINE LEARNING TECHNIQUES The Knowledge Factory uses the DLGref learning algorithm (Webb, 1993). This is an attribute-value classification learning algorithm that supports refinement of existing rules. Attribute-value classification learning algorithms take as input a training set of pre-classified examples. Each of these examples is described by a vector of attribute- values and is labelled with the correct class. The algorithm forms a model that relates combinations of attribute-values to classes. Note that while this abstract description of machine learning describes the models that it produces as classifiers, classification can be used to model many forms of discrete decision making. Thus, classification learning encompasses many useful learning tasks. Generalisation, Specialisation, and Cover The Knowledge Factory makes extensive use of two operations on rules: generalisation and specialisation. The following provides a brief introduction to these operations. The classification rules used by The Knowledge Factory have the form IF condition THEN classification. If the condition of a rule is satisfied by a case then the rule is said to cover that case. A rule x is a generalisation of a rule y if x necessarily covers all cases that y covers. For example, the rule IF age>20 THEN old is a generalisation of IF age>30 THEN old. A case covered by the second rule must also be covered by the first rule. To create a generalisation of a rule in the language used by The Knowledge Factory it is necessary to either make a constraint on an attribute less strict, for example by lowering the minimum value allowed as in the example above, or remove a constraint on an attribute, as with the generalisation from IF age>20 and hair is fair THEN x to IF age>20 THEN x . If rule x is a generalisation of rule y, then y is a specialisation of x. Note that by these definitions, a rule is both a generalisation and a specialisation of itself. A rule x is a proper generalisation of a rule y if x is a generalisation of y and x is not a specialisation of y. If x is a proper generalisation of y then y is a proper specialisation of x. 4
  • 5. A rule x is a least generalisation (Plotkin, 1970) with respect to rule y and case z, if x is a generalisation of y and x covers z, and there is no rule r that is a proper specialisation of x, a generalisation of y and covers z. For example, IF age>20 THEN old is a least generalisation of IF age>30 THEN old with respect to a case with age 20. A rule x is a least specialisation with respect to rule y and case z, if x is a specialisation of y and x does not cover z, and there is no rule r that is a proper generalisation of x, a specialisation of y and does not cover z. For example, IF age>30 THEN old is a least generalisation of IF age>20 THEN old with respect to a case with age 29 (assuming that only integer values are allowed for age). DLGref DLGref is a variant of the AQ (Michalski, 1993) learning algorithms. These algorithms form a set of classification rules incrementally. For each class a rule is sought that performs best on the remaining examples. Best performance is judged in a variety of ways, but usually involves covering as many examples of the class and as few examples not of the class, as possible. DLGref allows the user to specify a metric of good performance. Within The Knowledge Factory, two metrics are supported. The max-consistent metric does not accept any rules that cover examples of another class and prefers of the remaining rules those which cover the most examples of the positive class. Laplace is based on the Laplacian law of succession. It allows a trade-off between covering greater numbers of examples and covering small numbers of counter- p+1 examples. Using this metric, the value of a rule equals n+2 , where p is the number of positive examples covered by the rule and n is the total number of examples covered by the rule. Having selected the best rule, all examples that it correctly classifies are removed from the training set and the process is repeated. The rules so selected are pooled together to form a set of classification rules. The DLGref algorithm is presented in Appendix A. Whereas most AQ-like covering algorithms search for each rule starting from very general rules and considering successive specialisations thereof, DLGref starts from highly specific rules and explores successive generalisations. The successive generalisations are generated using least generalisation. This process readily supports ordinal and continuous valued attributes, forms that cause difficulties for the specialisation-based search of traditional AQ algorithms. Information Utilised by the Machine Learning System An attribute-value classification learning system utilises two major types of information. The first is the set of attributes and their possible values. This, along with the formalism for expressing classifiers that the system employs, defines the set of possible classifiers that the system can form. Clearly, the system cannot form classifiers that utilise attributes about which the learning system has not been informed. The second type of information that is used is the training set. The learning system searches the space of possible classifiers that it is capable of forming, seeking a classifier that best classifies the examples in the training set. Clearly, the system is greatly influenced by the quality of the training set with which it is provided. If there are intricacies to classification in a domain that are not represented in the training set then the learning system cannot learn them. 5
  • 6. TECHNIQUES One of the primary design considerations for The Knowledge Factory was to allow direct interaction between the knowledge acquisition system and an expert with minimal computing expertise. This has been achieved by a number of previous knowledge acquisition systems, notably repertory grid based tools such as ETS (Boose, 1986) and the ripple-down-rules tools of Compton, Edwards, Srinivasan, Malor, Preston, Kang, & Lazarus (1992). However, a significant difference between The Knowledge Factory and these previous tools is that The Knowledge Factory’s incorporation of machine learning frees it from a reliance on the expert having all encompassing knowledge of the domain. Previous systems have relied upon the expert being able to specify a suitable solution for any problem that the system may present. The expert, then, is viewed as all-knowing. All that is required is to help him or her to develop a formal model of the domain that captures the relevant aspects of his or her expertise. With the inclusion of machine learning, this view can change. Now the expert is viewed as having valuable insight into the domain, but this insight need not be all encompassing. Where the expert is unable to provide suitable models, the machine learning system can supply models instead. But the interactions may be more sophisticated than each simply producing part of a model, the two parts then being assembled. An alternative scenario is that an expert may have expertise in some aspect of a domain but may not be able to express formal models that encode that expertise. That is, the expert may be able to solve a particular class of problems, but may not be able to specify a set of expert system rules to reproduce those solutions. In this case, the machine learning system can be used to produce a ‘first draft’ of a set of rules. The expert may then be able to critique and refine those rules. Thus, machine learning can be used to ‘kick-start’ the model specification and refinement process. A further scenario is that even after the use of machine learning to produce a first draft, the expert may still be unable to specify appropriate formal models. In this case, he or she may be able to critique the machine learning system’s models, enabling it to further refine its own models. Finally, the expert may be able to specify some formal models initially, but these models may be incomplete or imprecise. The machine learning system may then refine this initial knowledge-base. The expert can review and revise the machine learning system’s contributions in turn, starting an ongoing sequential process of refinement. The commonalities of all these scenarios are an underlying view of the expert and the machine learning system as partners in the knowledge acquisition task, each of which has differing insights into a domain. The expert’s insights are derived from training, experience and both general and commonsense knowledge. The machine learning system’s insights are derived from analysis of a set of training examples. Communication Mechanisms Such pooling of insight clearly requires communication between the partners of a form not normally supported by machine learning systems. The Knowledge Factory uses example cases as the primary basis for communication. This case-based communication paradigm suits both partners in the collaboration. Experts in many domains are familiar with the use of examples, both to have points illustrated and for 6
  • 7. illustrating points. For a machine learning system, example cases are the major source of evidence that is considered. Many of the decisions that a machine learning system makes can only be explained in terms of how a model relates to the provided training examples. With The Knowledge Factory’s case-based communication, each partner uses example cases to communicate with the other. Example windows The primary mechanism used by the machine learning system is to provide windows that list the example cases that stand in a particular relationship to a rule or set of rules. Eight of these example windows are maintained. While these windows could be composed to answer specific queries from the user, or even to anticipate queries that the user might wish to pose, the complexity of creating such queries, and the difficulties of understanding system generated summaries of the queries that an ad hoc window might address, mitigate against the use of such mechanisms with our target users. Rather, windows that address the issues that are most likely to concern the users are maintained at all times. This constant display allows a user to familiarise himself with the meaning of each window, not having to devote attention to comprehending the meaning of a window each time that it is considered, as would be the case if windows presenting different types of information were generated dynamically by the system. The user is then able to attend to individual windows at his own discretion, ignoring those that do not bear upon his current concerns and utilising those that do. The first example window simply lists all available examples. It is perhaps stretching the scope of the technique to include this window within the case-based communication mechanism, as, in contrast to the other example windows, it does not really serve a communication purpose. Rather, it provides the user with ready access to the available example cases. The Indistinguishable Examples Window contains all examples for which there is an example from another class such that the system is not able to form a rule that distinguishes the two. This will occur if there is no attribute for which the two examples in question have different values. Unknown values do not count as distinct for this purpose. The presence of examples in this window indicates that it is not possible to form rules that correctly classify all examples given the current set of attributes. This indicates to the user that it might be desirable to consider revision of the current set of attributes. Whenever a rule from the current knowledge base is selected, the Positive Examples Window lists all examples that the rule correctly classifies. The Counter Examples Window displays all examples that the current rule incorrectly classifies. The Uncovered Examples Window lists all examples that belong to the class indicated by the rule’s conclusion, but which are not covered by the rule. These windows indicate to the user the evidence that led the system to select the current rule. They can be used both to aid the user’s understanding of the underlying support for the rule and to help the user to formulate alternatives to the rule that serve the same purpose. They also provide a simple mechanism for the user to ask what if? and why not? questions. To determine why the system did not select a specific alternative to a rule, the user need only create and select the other candidate. If it was not created because it failed to cover specific examples, this will be indicated by the contents of the Positive Examples Window. If the failure to create it was due to the alternative rule incorrectly classifying Fig 2 about here 7
  • 8. specific cases, this will be indicated by the contents of the Counter Examples Window. Finally, they provide a mechanism whereby the machine learning system can critique rules that the user formulates. If a rule fails to cover example cases, this will be indicated by the Uncovered Examples Window. If a rule misclassifies cases, this will be indicated by the Counter Examples Window. Finally, if the rule performs well, this will be indicated by the Positive Examples Window. The operation of these windows is illustrated in Fig. 2. Here, a rule for diagnosing Immunoglobulin A Deficiency (IGA_NX) is selected. This rule has been learned by the machine learning system. The Positive Examples Window displays the examples that support the rule. The Uncovered Examples Window displays the IGA_NX cases that the rule fails to cover. The empty Counter Examples Window shows that this rule does not misclassify any examples. Fig 3 To illustrate the use of these windows to answer why not? questions, Fig. 3 displays the about same situation after the user deletes the first clause from the rule. This is equivalent to here asking why not develop this alternative rule? As can be seen from the Counter Examples Window, the answer is because the alternative rule covers the two cases that are listed. A fifth example window also relates to the current rule. The Insufficient Information Examples Window lists any cases for which it cannot be determined whether or not the rule covers the example. This will occur when an example has unknown values for some of the attributes referred to in a rule, and satisfies all clauses relating to attributes for which it has values defined. This window alerts the user to potential problems arising from missing values in the data. The previous four example windows relate to the performance of a rule considered in isolation. The remaining two example windows relate to the performance of a set of rules as a whole. The Misclassified Examples Window lists cases that are misclassified by the rule set. Such misclassification may occur even if the example is correctly classified by one or more rules, if it is also misclassified by other rules. The manner in which such conflicts between rules are handled by the system is under user control. Rules may be ordered, so that the first rule encountered that covers a case is used. Alternatively, rules may be assigned values. Under this treatment, of the rules that cover a case, that with the highest value is employed. Rule values may be specified by the learning system or by the user. A final alternative is that in cases of multiple conflicting conclusions, the case remains unclassified. The Unclassified Examples Window lists cases that are not classified by the rule set. This may occur if multiple rules cover a case, as described above, or if no rule covers a case. The system may be set so that if no rule covers a case either the case is unclassified or a default classification is assigned. These two windows allow the user to obtain holistic evaluation of the rule set in addition to the rule specific evaluation provided by the majority of examples windows. The examples windows serve another function—example set validation. It is very easy for errors to enter into data. These may occur through errors in data entry or transmission or even due to imprecision in measuring instruments. In the traditional machine learning context, such errors will only rarely become apparent, as the knowledge bases that are inferred are not usually related back to the training examples. In contrast, The Knowledge Factory’s examples windows encourage the 8
  • 9. user to closely inspect significant examples. In particular, the Insufficient Evidence, Counter, and Uncovered Examples Windows will often highlight cases that contain errors. Case-based rule critique The example windows enable the machine learning system to justify its actions to the user and to critique rules created by the user. The user is also able to critique rules formed by the learning system. In our experience it is common that experts will dislike a rule formed by the learning system but be unable to articulate appropriate modifications thereto. In such circumstances the expert will often be able to provide examples to support his or her disquiet. It is straightforward to add those examples to the training set. Such examples may be complete, including values for all attributes, and possibly derived from an actual case from the expert’s experience. Alternatively, they might be partial, with values specified only for some attributes. Such an example provides an abstract characterisation of types of case for which an alternative decision is appropriate. Case-based rule editing Another manner of using examples to modify rules is provided by case-based rule editing. Specification of counter-examples provides the machine learning system with information about conditions under which a rule should not apply. This may either cause the system to specialise a rule by adding more constraints to its application, or to select a different set of attributes with which to condition the application of the rule. To extend a rule to cover additional types of situation is less straightforward however. The addition of further positive examples to the training set may result in a rule being modified to cover those cases when learning is next applied. However, the system may equally cover those cases through modification of other rules for the class, or the generation of new rules. Also, depending on the metric used to evaluate the quality of rules, the machine learning system may tolerate a small number of counter-examples to a rule due to the quality metric evaluating them as not sufficient to warrant specialising the rule, thus defeating the user’s objective in specifying a counter- example. Case based rule editing has been developed to deal with such circumstances when an expert is unable to directly specify suitable changes to a rule, but can identify appropriate counter-examples or uncovered positive examples. With this technique, the expert identifies a rule and a case that the rule should or should not cover, and the system suggests modifications to the rule that would produce the desired outcome. The user can then evaluate and select between the proffered modifications or explore other forms of revision. When excluding a case from the cover of a rule, the system generates all least Fig 4 specialisations of the rule with respect to the case. This is illustrated in Fig. 4. In this about example, we have removed a clause from a rule, which is highlighted in the Rules here Window. This has caused it to incorrectly cover some counter examples, which are displayed in the Counter Examples Window. To explore how these counter examples might be excluded from the cover of the rule we have then applied the Contract to Exclude Case command, selecting the first case in the Counter Examples Window. This generates a large number of alternative rules, of which the last three are displayed in the Alternative Rules Window. Two of these add possible constraints on the 9
  • 10. attribute age while the last adds a constraint on the attribute gender. The effect of these new constraints is summarised on the summary line after each rule and could be explored further using the examples windows. Further case based editing could be applied to one of these alternatives until a satisfactory outcome was obtained, or the user might reach a stage where he or she is able to complete the necessary modifications directly. When generalising a rule to cover a case, the system generates the least generalisation of the rule with respect to the case. For the types of attribute-value rule supported, there will only ever be a single least generalisation of a rule with respect to a single case. If a user specifies a new case that a rule should cover, the Expand to Cover Case command can be used to force the system to immediately generalise that rule to cover that case. As with the Contract to Exclude Case command, several such case-based rule editing actions may be required before a user is satisfied with the resulting rule. Machine generated examples are not used It is interesting to contrast the case-based communication mechanisms utilised by The Knowledge Factory with the techniques pioneered in MARVIN (Sammut & Banerji, 1986). When testing hypotheses, MARVIN actively generates ‘examples’ which it asks the expert to classify. Such a mechanism has been deliberately excluded from The Knowledge Factory in the belief that, for many domains, without access to the types of background knowledge that we have also deliberately excluded from the system, examples created by the computer are likely to be nonsensical. In a medical domain, for example, the interactions between and constraints on attributes will be many and varied. Computer generated cases are likely to violate such constraints, for example, by hypothesising pregnant male patients. The exploration of violation of such constraints is unlikely to be useful, as the expert will know that the possibility does not have to be excluded from a rule as it will never arise in practice. Our case-based communication mechanisms exclusively use cases specified by the user or imported from outside the system. In consequence, all cases should be meaningful to the user. The computer may present cases to the user, but only cases selected from those provided to it. Exploration of Alternative Rules Case-based rule editing is able to assist in some circumstances where an expert ‘knows’ that a rule is unsuitable, but cannot identify precisely how. But, it is limited to situations in which the user can specify or identify appropriate example cases. This will not always be possible. Alternative rules are another mechanism that The Knowledge Factory provides for such situations. Machine learning systems frequently make arbitrary choices between possible tests for inclusion in a rule. There will often be several tests, all of which cover exactly the same set of training cases. A machine learning system must select one of these, and usually does so in an arbitrary manner. That such a choice has been made will not usually be flagged to the user. However, while the choice may have been arbitrary given the information available to the system, an expert may well be able to make comparative qualitative judgements about the alternatives. This problem could be tackled by involving the user in the process of selecting tests during induction. However, this would impose a large burden on the user. There are very many such tests considered 10
  • 11. in typical learning contexts. Further, in many situations the user may not have any better basis on which to choose between alternatives than does the system. Instead, The Knowledge Factory offers the user a facility for identifying and selecting between alternative rules. This facility can be invoked by the user at any time at his own discretion. The Form Alternative Rules command creates a set of alternatives to the current rule. First, the system identifies all most general rules that cover all cases correctly covered by the current rule. Two more rules are added. The first is the most specific rule to cover all of the cases correctly covered by the initial rule. This provides for the user a complete presentation of all tests that could possibly be used to cover these cases. The second additional rule has as its antecedent the conjunction of all antecedents of all the most general rules that were formed. This summarises for the user the set of all tests that appear to be effective in distinguishing the current class from others. These rules are displayed in a window within which the user may Fig 5 explore them using all of the system’s rule editing and evaluation facilities. about here Figure 5 illustrates the outcome of this command. In this example, rules have been developed from examples of diagnosis of hypothyroid disease (Quinlan, Compton, Horn, & Lazarus, 1986). The rule highlighted in the Ruleset Window is the original rule for which alternatives have been developed. The Alternative Rules Window lists the four alternatives that were discovered. The last two of these rules are the most general alternatives. The first of these is the original rule. The second is identical to the first except that the clause psych is f is replaced by sick is f. All 43 cases covered by the original rule satisfy both of these conditions. One or the other of these clauses is needed if a rule is to cover all 43 cases correctly classified by the original rule and no cases of another class. For this rule there happens to be one case of class compensated_hypothyroid that satisfies all of the other conditions but, unlike any of the positive cases, has values of t for both psych and sick. This is a clear illustration of a situation in which system has no sensible basis on which to choose between alternatives but an expert may. The first of the alternative rules is the most specific rule to cover all 43 cases. The second alternative rule contains all clauses in either of the most general alternatives. Fig 6 Note that this is a very simple example of alternative rules, deliberately chosen for ease about of presentation and explanation. For contrast, Fig. 6 presents some of the 39 alternative here most general rules all of which cover the only two cases of class secondary_hypothroid in the training set and none of which cover any cases of another class. There is an interesting relationship between this alternative rules technique and the example generation technique of MARVIN (Sammut & Banerji, 1986). In both situations the system wishes to obtain advice from an expert to help it distinguish between alternatives that are equally well supported by the training cases. However, whereas MARVIN generates hypothetical cases in an attempt to distinguish between the alternatives, we present the alternatives directly. As argued above, automated generation of hypothetical cases will often be unsuccessful in real world domains, if only because such cases are unlikely to appear realistic to an expert. We believe that complete rules of the type that The Knowledge Factory creates will often be easier for the expert to comprehend and manipulate. 11
  • 12. Interactive Domain Model Revision Buntine & Stirling (1991) argue that effective real world application of machine learning frequently requires a cyclical process in which the results of machine learning are provided to the expert for review which may lead to the identification of relevant information not originally provided to the machine learning system, which leads back to another phase of application of machine learning with revised inputs. One of the design objectives for The Knowledge Factory is to allow the expert to complete this cycle within the system. With many machine learning systems, each time through the cycle, machine learning must start afresh. I have already described how the system can iterate through such a cycle when the new information is presented in the form of new or revised example cases. Greater change is required, however, if the expert realises that there are deficiencies in the set of attributes selected with which to describe the examples. We have observed this circumstance in practice. The expert, when confronted with a rule developed by the system, and the evidence it can provide in the rule’s support, realises that a factor not included in the current case descriptions is necessary to adequately revise the rule. Another type of change is required when it is realised that there is a need to provide finer grade distinctions for an existing attribute (one or more current value needs to be replaced by a number of values) or some other transformation of an attribute is needed (conversion from ordinal to categorical or vice versa, for example). The Knowledge Factory provides facilities for performing such conversions. Existing rules and cases that are affected by the change are automatically transformed, where possible, so as to retain as much as possible of the existing knowledge base. When a new attribute is created, all cases are initialised with the value for this attribute set to unknown. When the attribute is only relevant in a small number of contexts, the user need only specify values for a small number of cases, for example, the positive and counter examples for the current rule. Direct Expert Evaluation of Rules While cased based communication is able to accommodate many of the interactions between an expert and a machine learning system, there are some communication needs that it cannot cover. One of these is the direct communication of general qualitative rule evaluation by the expert. Case-based communication allows the expert to inform the system about specific deficiencies of a rule. It does not allow the expert to specify that the current form of a rule is very good, or that it is unacceptable without specifying how. The Knowledge Factory allows the expert to attach qualitative labels to individual rules. The three labels currently supported are no good, revisable, and good. The Knowledge Factory takes account of these assessments during induction. Rules that are rated no good are discarded before rule refinement commences. This usually results in the formation of totally new rules, although in some circumstances the induction system may derive the same rule again. Techniques for avoiding this outcome would be desirable, but are difficult to incorporate into traditional machine learning systems. Rules that are rated good are not altered during rule refinement, but rather are passed directly into the final rule set. 12
  • 13. Revisable rules are retained, but may be altered to reduce the number of negative cases or increase the number of positive cases covered. This is the default annotation. These rule annotations are simple for the user to utilise, but provide useful guidance to the rule refinement process. Summary Ruleset Evaluation Most machine learning systems provide facilities for summarising the performance of inferred rules when applied to an evaluation set of cases. The primary metric used in such evaluation is usually accuracy. But in real world applications, this will not always be the most important measure of performance. In many domains, some errors are more significant than others (for example, failure to diagnose a serious disease in contrast to unnecessary but comparatively harmless treatment arising from diagnosing a disease that is not present). We have sought to develop a simple manner in which to capture all of the relevant summary information that is likely to be useful for a user. Fig 7 Figure 7 presents an example of the tabular format for presenting evaluation results about that we have created. As there are four classes in this example (using the hypothyroid here diagnosis data previously mentioned), there is a 4x4 matrix which presents the number of cases belonging to each class that have been given each classification. The first four rows of the table relate to the class assigned to a case by the current rule set while the first four columns relate to the correct class for a case. The fifth column relates to all cases. The sixth column presents for each class, the percentage of those cases assigned that class by the rule set for which that classification was correct. The fifth row presents the total number of cases of each class for which any classification was assigned by the rules. The sixth row summarises the percentage of those that were correct. The seventh row indicates the number of cases for each class for which no decision was made. The eighth row shows the proportion of all cases for the class for which a classification was assigned. The ninth row displays the sensitivity of the rule set for the class. This is the proportion of cases belonging to the class that were correctly classified. The final row displays the specificity for the class. This is the proportion of cases not belonging to the class that were not incorrectly classified as belonging to the class. EXPERIMENTAL EVALUATION A major difficulty facing research in knowledge acquisition is evaluation of alternative techniques. Real world knowledge acquisition tasks usually require large and very expensive projects. It is not feasible to conduct controlled experiments in which different methodologies are each applied to the same full scale knowledge acquisition project under real-world conditions. As a result, other than case studies, there has been little experimental investigation of the performance of different techniques. Even the ground-breaking Sisyphus projects (Linster, 1992; Schreiber & Birmingham, 1996), in which many different systems were all applied to common knowledge acquisition tasks, is perhaps best thought of as a set of parallel case studies, as it was not possible to control extraneous factors that may have affected a participant’s use of a system, and hence comparisons are restricted. A case study can demonstrate the ability of a technique to achieve an outcome. Beyond this, they provide little comparative information about alternatives. 13
  • 14. We have been concerned to perform experimental evaluation of the efficacy of the techniques embodied in The Knowledge Factory. We were encouraged by the positive evaluations provided by participants in case studies (Webb, 1996). We wanted to have large numbers of subjects apply differing approaches to a knowledge acquisition task under controlled conditions. One way to achieve this was to set appropriate assignments for students in an undergraduate course on Artificial Intelligence and Expert Systems, and to study their performance. Subjects could be provided software with different capabilities and their performance measured on common tasks under controlled conditions. An initial attempt at such evaluation (Webb & Wells, 1996) was undermined by a user interface flaw. Many subjects seeking to use The Knowledge Factory’s machine learning facilities to refine rules that they had created unintentionally employed machine learning in a mode that deleted their rules and replaced them by rules learned by the machine learning system independently from the initial rules. In consequence, while on average outperforming both knowledge acquisition without access to machine learning facilities and the application of machine learning alone, the integrated application of machine learning with knowledge acquisition from experts failed to demonstrate a statistically significant advantage over machine learning alone. When the subjects that applied machine learning in the mode that overwrote existing rules were excluded from consideration, the advantage over machine learning was statistically significant, but the selection of subjects for analysis in this manner cast doubt upon the results. This user interface fault was rectified, and a second such experiment undertaken (Webb, Wells, & Zheng, 1999). In this second experiment, we again used undergraduate students performing a class assignment. This assignment was conducted under supervised laboratory conditions. Knowledge acquisition tasks were created using the Glass and Soybean Large data sets from the UCI Repository of Machine Learning Databases (Merz & Murphy, 1997). The Glass data relates to forensic analysis of glass fragments and the Soybean Large data pertains to the diagnosis of soybean plant diseases. Subjects were given randomly selected training sets of examples. The accuracy of their expert systems could then be tested by evaluating their performance on withheld evaluation sets. To simulate use by experts, the subjects were provided with rules for these tasks learned by C4.5rules (Quinlan, 1993) from subsets of the data. As these subsets included both training and evaluation cases provided to the subjects, the rules could be expected to capture information about the domains that subjects could not glean from examination of their training data either with or without the use of machine learning tools. Two versions of The Knowledge Factory were created. A number of facilities were disabled that could not be usefully employed in the context of the study. This was to minimise the chance of a repetition of the previous outcome where misapplication of the software invalidated the study. The facilities disabled included changing the manner in which rules were interpreted during execution of a rule set, saving and loading sets of examples from disk, and the selection, maintenance and utilisation of separate training and evaluation sets. All of the key knowledge acquisition facilities were retained, except that machine learning capabilities were disabled in one version, called KA-alone. 14
  • 15. Subjects were randomly divided into two groups. One group used KA-alone for the Glass data and the full system for Soybean Large, and the other group used the systems in the other order. Seventeen subjects participated in the experiment. Table 1 presents the mean Table 1 about predictive accuracy on the evaluation cases obtained using each system for each data here set, along with that of The Knowledge Factory’s machine learning system when applied using the training and evaluation cases used by subjects using the full system. For both data sets the integrated use of machine learning with knowledge acquisition from experts led to higher predictive accuracy than the use of either of its constituent approaches in isolation. With respect to KA-alone, both of these differences were statistically significant at the 0.05 level (one-tailed two-sample t tests; Soybean Large:t=2.1, p=0.026; Glass: t=3.35, p=0.001). With respect to learning-alone, the difference was significant at the 0.05 level for the Glass data (one-tailed matched-pairs t test: t=2.5, p=0.021) but the other was not (one-tailed matched-pairs t test: t=0.9, p=0.279). Note that the power of these tests was low due to the small number of subjects and hence that the failure to find a significant difference provides only very weak evidence that such a difference does not actually exist. The significant differences that were found demonstrate that for at least some knowledge acquisition tasks, the integrated use of machine learning with knowledge acquisition from experts can produce more accurate expert systems than either constituent method in isolation. The integrated approach was also faster than KA-alone. For Soybean Large the mean and standard deviation for the integrated approach was 73±45 minutes while for KA- alone it was 131±19. For Glass these figures were respectively 16±19 and 115±38. One- tailed two-sample t tests reveal that both of these differences are significant at the 0.05 level (Soybean Large: t=3.3, p=0.002; Glass: t=1.9, p=0.037). In both cases the time savings from the integrated use of machine learning with knowledge acquisition were substantial. Of course, they were both eclipsed in this respect by the application of machine learning alone, for which knowledge acquisition time can be measured in seconds rather than minutes. Questionnaire responses from participants indicated that the subjects believed the machine learning facilities were useful, found the knowledge acquisition process easier when machine learning facilities were available, and had greater confidence in the expert systems developed with the aid of machine learning. These results, presented and analysed in more detail by Webb, Wells, & Zheng (1999), all provide support for the efficacy of the approaches that we have developed. At least for some knowledge acquisition tasks they can lead to the more rapid development of more accurate expert systems in which the users have greater confidence than does knowledge acquisition without machine learning, and to more accurate expert systems than machine learning in isolation. CONCLUSIONS The Knowledge Factory is a knowledge acquisition system that is designed for direct use by domain experts. It differs from most previous systems intended for this use by incorporating machine learning facilities. In consequence, it is not necessary for the expert to provide solutions for every contingency that the knowledge base must cover. The machine learning system can fill gaps in the expert’s expertise. Both the machine 15
  • 16. learning system and the expert can critique and propose refinements to the other’s rules. One of the distinctive features of The Knowledge Factory is the manner in which communication with the user has been structured around the use of example cases. Techniques have been developed that allow both the user and the system to convey non-trivial information through this mechanism. Interestingly, however, we have concluded that one such form of communication that has received previous use, the construction of hypothetical cases by the system in order to test hypotheses, is actually inappropriate in our context. Instead, we favour explicit presentation of the competing hypotheses in this situation. The Knowledge Factory is distinguished from previous knowledge acquisition systems by the manner in which it supports experts with minimal computing expertise to directly interact with a machine learning system during all phases of knowledge acquisition. Case studies have found that such users have little difficulty in using the system and controlled studies suggest that the integration of machine learning with knowledge acquisition within the system can outperform either constituent approach in isolation. REFERENCES Agar, J., & Webb, G. (1992) The application of machine learning to a renal biopsy data-base. Nephrology, Dialysis and Transplantation, 7: 472-478. Attar Software (1989). Structured decision tasks methodology for developing and integrating knowledge base systems. Attar Software, Leigh, Lancashire. Boose, J. H. (1986). ETS: A system for the transfer of human expertise. In J. S. Kowalik (Ed.), Knowledge based problem solving. New York: Prentice-Hall. Buntine, W., & Stirling, D. (1991). Interactive induction. In J. E. Hayes, D. Michie, & E. Tyugu (Eds.) Machine Intelligence 12. Oxford: Clarendon Press, pp. 121-137. Compton, P., Edwards, G., Srinivasan, A., Malor, R., Preston, P., Kang, B., & Lazarus, L. (1992). Ripple down rules: Turning knowledge acquisition into knowledge maintenance. Artificial Intelligence in Medicine, 4: 47–59. Davis, R., & Lenat, D. B. (1982). Knowledge-based systems in artificial intelligence. New York: McGraw-Hill. De Raedt, L. (1992). Interactive theory revision. London: Academic Press. Kodratoff, Y., & Vrain, C. (1993) Acquiring first-order knowledge about air traffic control. Knowledge Acquisition, 5, 1–6. Linster, M. (1992) A review of Sisyphus 91 and 92: Models of Problem-Solving Knowledge. In N. Aussenac, G. Boy, B. Gaines, M. Linser, J.-G. Ganascia, & Y. Kordratoff (Eds) Knowledge Acquisition for Knowledge-Based Systems. Berlin: Springer- Verlag, pp. 159-182. Merz, C. J. & Murphy, P. M. (1997) UCI Repository of Machine Learning Databases [Machine-readable data repository]. University of California. 16
  • 17. Michalski, R. S. (1983). A theory and methodology of inductive learning. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.) Machine learning: An Artificial Intelligence Approach. Berlin: Springer-Verlag. Morik, K., Wrobel, S., Kietz, J.-U., & Emde, W. (1993). Knowledge Acquisition and Machine Learning: Theory, Methods, and Applications. London: Academic Press. Nedellec, C., & Causse, K. (1992). Knowledge refinement using knowledge acquisition and machine learning methods. Proceedings EKAW'92. Berlin: Springer-Verlag, pp. 171-190. O'Neil, J. L., & Pearson, R. A. (1987). A development environment for inductive learning systems. In Proceedings of the 1987 Australian Joint Artificial Intelligence Conference. Sydney, pp. 673-680. Plotkin, Gordon D. (1970) A note on inductive generalisation. In B. Meltzer & D. Mitchie (Eds) Machine Intelligence 5. Edinburgh University Press, Edinburgh, pp. 153- 163. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann. Quinlan, J. R., Compton, P., Horn, K. A., & Lazarus, L. (1986) Inductive Knowledge Acquisition: A Case Study, New South Wales Institute of Technology School of Computing Sciences, Technical Report 86.4, Sydney. Sammut, C. & Banerji, R. B.(1986) Learning concepts by asking questions. In Michalski, Ryszard S., Carbonell, Jaime G., & Mitchell, Tom M. (Eds) Machine Learning: An Artificial Intelligence Approach Volume II. Morgan Kaufmann, Los Altos, pp. 167-191. Schmalhofer, F., & Tschaitschian, B. (1995). Cooperative knowledge evolution for complex domains. In G. Tecuci, & Y. Kodratoff (Eds.) Machine learning and knowledge acquisition: Integrated approaches. London: Academic Press. Schreiber, A. & Birmingham , W. P. (1996) The Sisyphus-VT initiative. International Journal of Human-Computer Studies. 44. Shapiro, A. (1987) Structured Induction in Expert Systems. Addison-Wesley, London. Smith, R. G., Winston, H. A., Mitchell, T. M., & Buchanan, B. G. (1985). Representation and use of explicit justifications for knowledge base refinement. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence. San Mateo, Ca: Morgan Kaufmann, pp. 673-680. Tecuci, G., & Kodratoff, Y. (1990). Apprenticeship learning in imperfect domain theories. In Y. Kodratoff, & R. Michalski (Eds.) Machine learning: An Artificial Intelligence Approach. San Mateo, CA: Morgan Kaufmann. Webb, G. I. (1993). DLGref2: Techniques for inductive knowledge refinement. In Proceedings of the IJCAI Workshop W16. Chambery, France, pp. 236-252. Webb, G. I. (1996). Integrating machine learning with knowledge acquisition through direct interaction with domain experts. Knowledge-Based Systems, 9: 253-266. 17
  • 18. Webb, G. I. & Wells, J. (1996) Experimental evaluation of integrating machine learning with knowledge acquisition through direct interaction with domain experts. In P. Compton, R. Mizoguchi, H. Motada, & T. Menzies (Eds) Proceedings of PKAW'96: The Pacific Knowledge Acquisition Workshop. Sydney, pp. 170-189. Webb, G. I., Wells, J., & Zheng, Z. (1999) An experimental evaluation of integrating machine learning with knowledge acquisition. Machine Learning, 35(1): 5-24. Wilkins, D. C. (1988). Knowledge base refinement using apprenticeship learning techniques. In Proceedings of the Seventh National Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, pp. 646-651. 18
  • 19. APPENDIX A THE DLGREF ALGORITHM α is the most specific possible rule for the class positive, that covers no cases. The least generalisation of α against a case results in the most specific rule that covers that case. ζ is the most general possible rule for the class positive, that covers all cases. Function DLGref Parameters: rules: an initial set of rules for a single class POS: a set of cases belonging to that class NEG: a set of cases that do not belong to the class value(): a function from rules to numeric values such that the higher the value the greater the preference for the rule. Returns: rules: a revised set of rules for the class for r is set to each rule in rules in succession if r covers any cases in NEG spec_rule <- generalise_rule(α, covered_cases(r, POS), NEG, value) if spec_rule covers any cases in POS r <- generalise_toward(spec_rule, r, NEG) end if end if remove from POS all cases that r covers end for for r is set to each rule in rules in succession r <- generalise_rule(r, POS, NEG, value) remove from POS all cases that r covers end for while POS is not empty r <- generalise_rule(α, POS, NEG, value) if r covers any cases in POS r <- generalise_toward(r, ζ, NEG) remove from POS all cases the r covers add r to rules else remove all remaining cases from POS end if end while 19
  • 20. Function generalise_rule Parameters: rule an initial rule for the positive class POS: a set of cases belonging to that class NEG: a set of cases that do not belong to the class value(): a function from rules to numeric values such that the higher the value the greater the preference for the rule. Returns: result: a generalisation of rule result <- rule for c is set to each case in POS in succession set r to the least generalisation of result with respect to c if value(r, POS, NEG) > value(result, POS, NEG) result <- r end if end for Function generalise_toward Parameters: spec an initial specific rule for the positive class gen: a generalisation of spec NEG: a set of cases that do not belong to the class Returns: result: a rule that is a generalisation of spec and a specialisation of gen and which covers no cases in NEG not covered by spec while spec ≠ gen for each clause c in the antecedent of spec if deleting c from spec would increase the number of cases in NEG covered by spec add c to gen end if end for for each clause c in the antecedent of spec if c is not in the condition of gen and adding c to the condition of gen does not decrease the number of cases in NEG covered by gen remove c from spec end if end for end while result <- spec DLGref makes two passes through the initial rules. In the first pass, any rules that cover negative cases are specialised, if possible, so as to no longer cover those cases. All positive cases covered by the resulting rules are then deleted as they do not need further attention. In the second pass through the initial rules, they are generalised as much as possible to cover further positive cases. Any cases so covered are also deleted. Finally, new rules are added to the rule set to cover any remaining positive cases. The function generalise_rule seeks to generalise an initial rule to cover as many as possible positive cases, but without unduly increasing negative cover. The value 20
  • 21. function is used to evaluate whether an increase in negative cover, if it occurs, is sufficiently offset by an increase in positive cover. The function generalise_toward generalises an initial specific rule toward a more general form as far as possible without increasing the negative cover of the initial rule. 21
  • 22. Table 1: Mean and standard deviation for predictive accuracy. Data set Integrated KA alone Learning alone Soybean Large 88.5±4.4 84.4±3.6 87.8±1.6 Glass 81.3±7.7 59.0±17.3 79.2±8.0 22
  • 23. 23
  • 24. FIGURES FIGURE 1: Examples of rules in The Knowledge Factory. 24
  • 25. 25 FIGURE 2: A selected rule with displayed positive, counter and uncovered examples.
  • 26. FIGURE 3: Answering a why not question. 26
  • 27. FIGURE 4: Case based editing: Rules generated by Contract to Exclude Case applied to the first case in the Counter Examples Window. 27
  • 28. FIGURE 5: Alternative rules that cover the same 43 cases of class primary_hypothyroid. 28
  • 29. FIGURE 6: A selection of the 39 alternative most general rules that cover both cases of class secondary_hypothyroid. 29
  • 30. FIGURE 7: The results window. 30