SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
BioHEL GBML System
right-logo
                                k-Disjuntive Normal functions
                                                Experiments
                               Conclusions and Further Work




                  Analysing BioHEL Using Challenging
                          Boolean Functions

               María A. Franco, Natalio Krasnogor and Jaume Bacardit

                                              University of Nottingham, UK,
                                                ASAP Research Group,
                                              School of Computer Science
                                              {mxf,nxk,jqb}@cs.nott.ac.uk


                                                      July 8, 2010



         M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   1 / 27
BioHEL GBML System
                       k-Disjuntive Normal functions
                                       Experiments
                      Conclusions and Further Work




1    BioHEL GBML System
       Characteristics of the system
       BioHEL fitness function
       Open questions for BioHEL

2    k-Disjuntive Normal functions

3    Experiments
       Experiment Setup
       Iterations and execution time
       Learning and overgeneralisation

4    Conclusions and Further Work


M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   2 / 27
BioHEL GBML System
                                                          Characteristics of the system
                         k-Disjuntive Normal functions
                                                          BioHEL fitness function
                                         Experiments
                                                          Open questions for BioHEL
                        Conclusions and Further Work

The BioHEL GBML System


         BIOinformatics-oriented Hierarchical Evolutionary Learning
         - BioHEL[Bacardit et al., 2009]
         BioHEL was designed to handle large scale bioinformatics
         datasets[Stout et al., 2008]
         BioHEL is a GBML system that employs the Iterative Rule
         Learning (IRL) paradigm
                 First used in EC in Venturini’s SIA system[Venturini, 1993]
                 Widely used for both Fuzzy and non-fuzzy evolutionary
                 learning
         BioHEL inherits most of its components from
         GAssist[Bacardit, 2004], a Pittsburgh GBML system


  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   3 / 27
BioHEL GBML System
                                                          Characteristics of the system
                         k-Disjuntive Normal functions
                                                          BioHEL fitness function
                                         Experiments
                                                          Open questions for BioHEL
                        Conclusions and Further Work

The BioHEL GBML System


         BIOinformatics-oriented Hierarchical Evolutionary Learning
         - BioHEL[Bacardit et al., 2009]
         BioHEL was designed to handle large scale bioinformatics
         datasets[Stout et al., 2008]
         BioHEL is a GBML system that employs the Iterative Rule
         Learning (IRL) paradigm
                 First used in EC in Venturini’s SIA system[Venturini, 1993]
                 Widely used for both Fuzzy and non-fuzzy evolutionary
                 learning
         BioHEL inherits most of its components from
         GAssist[Bacardit, 2004], a Pittsburgh GBML system


  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   3 / 27
BioHEL GBML System
                                                          Characteristics of the system
                         k-Disjuntive Normal functions
                                                          BioHEL fitness function
                                         Experiments
                                                          Open questions for BioHEL
                        Conclusions and Further Work

The BioHEL GBML System


         BIOinformatics-oriented Hierarchical Evolutionary Learning
         - BioHEL[Bacardit et al., 2009]
         BioHEL was designed to handle large scale bioinformatics
         datasets[Stout et al., 2008]
         BioHEL is a GBML system that employs the Iterative Rule
         Learning (IRL) paradigm
                 First used in EC in Venturini’s SIA system[Venturini, 1993]
                 Widely used for both Fuzzy and non-fuzzy evolutionary
                 learning
         BioHEL inherits most of its components from
         GAssist[Bacardit, 2004], a Pittsburgh GBML system


  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   3 / 27
BioHEL GBML System
                                                          Characteristics of the system
                         k-Disjuntive Normal functions
                                                          BioHEL fitness function
                                         Experiments
                                                          Open questions for BioHEL
                        Conclusions and Further Work

The BioHEL GBML System


         BIOinformatics-oriented Hierarchical Evolutionary Learning
         - BioHEL[Bacardit et al., 2009]
         BioHEL was designed to handle large scale bioinformatics
         datasets[Stout et al., 2008]
         BioHEL is a GBML system that employs the Iterative Rule
         Learning (IRL) paradigm
                 First used in EC in Venturini’s SIA system[Venturini, 1993]
                 Widely used for both Fuzzy and non-fuzzy evolutionary
                 learning
         BioHEL inherits most of its components from
         GAssist[Bacardit, 2004], a Pittsburgh GBML system


  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   3 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Iterative Rule Learning

          IRL has been used for many years in the ML community, with the
          name of separate-and-conquer

         Algorithm 1.1: I TERATIVE RULE L EARNING(Examples)

           Theory ← ∅
           whileExample = ∅
                Rule ← FindBestRule(Examples)
                
                Covered ← Cover (Rule, Examples)
                
                
                
                  if RuleStoppingCriterion(Rule, Theory , Examples)
                
             do
                 then exit
                
                Examples ← Examples − Covered
                
                
                
                  Theory ← Theory ∪ Rule
                
           return (Theory )


   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   4 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Characteristics of BioHEL




          A fitness function based on the
          Minimum-Description-Length (MDL) (Rissanen,1978)
          principle that tries to
                  Evolve accurate rules
                  Evolve high coverage rules
                  Evolve rules with low complexity, as general as possible
          The ILAS windowing scheme
                  Efficiency enhancement method, not all training points are
                  used for each fitness computation




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   5 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Characteristics of BioHEL




          A fitness function based on the
          Minimum-Description-Length (MDL) (Rissanen,1978)
          principle that tries to
                  Evolve accurate rules
                  Evolve high coverage rules
                  Evolve rules with low complexity, as general as possible
          The ILAS windowing scheme
                  Efficiency enhancement method, not all training points are
                  used for each fitness computation




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   5 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Characteristics of BioHEL




          The Attribute List Knowledge representation
                  Representation designed to handle high-dimensionality
                  domains
          An explicit default rule mechanism
                  Generating more compact rule sets
          Ensembles for consensus prediction
                  Easy system to boost robustness




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   6 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Characteristics of BioHEL




          The Attribute List Knowledge representation
                  Representation designed to handle high-dimensionality
                  domains
          An explicit default rule mechanism
                  Generating more compact rule sets
          Ensembles for consensus prediction
                  Easy system to boost robustness




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   6 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Characteristics of BioHEL




          The Attribute List Knowledge representation
                  Representation designed to handle high-dimensionality
                  domains
          An explicit default rule mechanism
                  Generating more compact rule sets
          Ensembles for consensus prediction
                  Easy system to boost robustness




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   6 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

BioHEL fitness function

          Coverage term penalises rules that do not cover a minimum
          percentage of examples
          Choosing the coverage break changes the behaviour and
          performance of the entire system




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   7 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Open questions for BioHEL




          Does a single coverage break work for the same family of
          problems?
          How difficult is to hand-tune the coverage break?
          What is the performance impact of the coverage break
          when it is not properly adjusted?




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   8 / 27
BioHEL GBML System
                                                           Characteristics of the system
                          k-Disjuntive Normal functions
                                                           BioHEL fitness function
                                          Experiments
                                                           Open questions for BioHEL
                         Conclusions and Further Work

Open questions for BioHEL



          Does a single coverage break work for the same family of
          problems?
          How difficult is to hand-tune the coverage break?
          What is the performance impact of the coverage break
          when it is not properly adjusted?

  Motivation of the paper
  The motivation of the paper is to answer this questions. We
  used k-DNF problems to test exhaustively the system with
  problems that vary their difficulty.



   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   8 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

k-Disjuntive Normal functions



          r disjuntive terms
          d possible attributes
          k represented attributes in each term


   Example
   d = 10, k = 3, r = 3

             (¬x1 ∧ x5 ∧ x7 ) ∨ (x1 ∧ ¬x2 ∧ x8 ) ∨ (x4 ∧ ¬x5 ∧ ¬x9 )




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   9 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

k-Disjuntive Normal functions



          r disjuntive terms
          d possible attributes
          k represented attributes in each term


   Example
   d = 10, k = 3, r = 3

             (¬x1 ∧ x5 ∧ x7 ) ∨ (x1 ∧ ¬x2 ∧ x8 ) ∨ (x4 ∧ ¬x5 ∧ ¬x9 )




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   9 / 27
BioHEL GBML System
                              k-Disjuntive Normal functions
                                              Experiments
                             Conclusions and Further Work

k-DNF class imbalance


                                         Probability of having a negative example

                                                                                                   (1 - 2(-k))r



                                                                                                                   1
                                                                                                                   0.9
              1                                                                                                    0.8
            0.9                                                                                                    0.7
            0.8                                                                                                    0.6
            0.7
                                                                                                                   0.5
            0.6
                                                                                                                   0.4
            0.5
            0.4                                                                                                    0.3
            0.3                                                                                                    0.2
            0.2                                                                                                    0.1
            0.1                                                                                                    0
              0


                                                                                                         5
                                                                                                      10
                                                                                                   15
                                                                                               20
                  10                                                                        25
                       9        8                                                         30 r - Number of terms
                                        7     6                                        35
                                                    5                             40
                           k - Attributes expressed      4      3            45
                                                                      2 50



   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham            Analysing BioHEL Using Boolean Functions             10 / 27
BioHEL GBML System
                                                           Experiment Setup
                          k-Disjuntive Normal functions
                                                           Iterations and execution time
                                          Experiments
                                                           Learning and overgeneralisation
                         Conclusions and Further Work

Experimental setup



          90 different k-DNF scenarios
                  d = 20
                  k ranging between 2 and 10
                  r ranging between 5 and 50
          5 different coverage breaks
          We show results in terms of:
                  Iterations to learn a optimal k-DNF term
                  Number of cases where the system overgeneralised and
                  learned.
          Using a fixed default class and the majority policy



   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   11 / 27
BioHEL GBML System
                                                                     Experiment Setup
                            k-Disjuntive Normal functions
                                                                     Iterations and execution time
                                            Experiments
                                                                     Learning and overgeneralisation
                           Conclusions and Further Work

Iterations to learn a optimal k-DNF term

                                              Number of iterations to find a good rule

                                                                                   Model z=a*k + b*r + c*r2 + d
                                                                                                        0.0001
                                                                                                         0.001
                                                                                                          0.01
                                                                                                            0.1


                  14
                  12
                  10
                   8
                   6
                   4
                   2
                   0
                  -2

                                                                                                                        50
                                                                                                                   45
                                                                                                              40
                                                                                                         35
                       2                                                                              30
                            3                                                                    25
                                   4                                                        20          r - Number of rules
                                          5
                                                6                                      15
                                                        7
                           k - Number of terms in the rule    8                   10
                                                                     9
                                                                           10 5



                                                        a>b>c >d

   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham             Analysing BioHEL Using Boolean Functions                 12 / 27
BioHEL GBML System
                                                             Experiment Setup
                          k-Disjuntive Normal functions
                                                             Iterations and execution time
                                          Experiments
                                                             Learning and overgeneralisation
                         Conclusions and Further Work

Number of iterations to learn a good rule

                                                  Coverage break 0.0001
               5        10        15        20        25       30       35        40           45    50
        2    0,62
        3    1,64      1,83      1,64
        4    3,25      3,55      3,65      3,71       3,89    3,94        3,92
        5    4,33      4,92      5,48      5,63       5,93    5,96        6,04    6,13     6,07      6,26
        6    5,60      6,39      7,06      7,38       7,41    7,67        7,71    7,95     7,96      8,07
        7    6,55      7,68      8,19      8,40       8,84    9,05        9,22    9,47     9,57      9,63
        8    7,58      8,76      9,37      9,80       9,94   10,21       10,45   10,67    10,82     10,95
        9    9,02     10,22     10,72     11,09      11,45   11,64       11,77   12,03    12,10     12,24
       10   10,65     11,57     12,15     12,64      12,76   12,87       13,11   13,12    13,30     13,42
                                                  Coverage break 0.001
               5        10        15        20        25       30         35      40           45    50
        2    0,65
        3    1,64      1,83      1,53
        4    3,16      3,51      3,60      3,65       3,83    3,91        3,91
        5    4,31      4,79      5,31      5,53       5,87    5,91        5,92    6,01     5,95      6,18
        6    5,27      6,07      6,73      7,12       7,11    7,35        7,49    7,70     7,73      7,81
        7    5,96      7,10      7,58      7,80       8,30    8,45        8,69    8,95     9,09      9,19
        8    7,04      8,07      8,65      8,97       9,12    9,41        9,67    9,90    10,02     10,21
        9    9,18     10,10     10,39     10,70      11,00   11,11       11,19   11,43    11,51     11,64
       10   10,11     11,22               11,71


   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham     Analysing BioHEL Using Boolean Functions       13 / 27
BioHEL GBML System
                                                             Experiment Setup
                          k-Disjuntive Normal functions
                                                             Iterations and execution time
                                          Experiments
                                                             Learning and overgeneralisation
                         Conclusions and Further Work

Number of iterations to learn a good rule

                                                  Coverage break 0.0001
               5        10        15        20        25       30       35        40           45    50
        2    0,62
        3    1,64      1,83      1,64
        4    3,25      3,55      3,65      3,71       3,89    3,94        3,92
        5    4,33      4,92      5,48      5,63       5,93    5,96        6,04    6,13     6,07      6,26
        6    5,60      6,39      7,06      7,38       7,41    7,67        7,71    7,95     7,96      8,07
        7    6,55      7,68      8,19      8,40       8,84    9,05        9,22    9,47     9,57      9,63
        8    7,58      8,76      9,37      9,80       9,94   10,21       10,45   10,67    10,82     10,95
        9    9,02     10,22     10,72     11,09      11,45   11,64       11,77   12,03    12,10     12,24
       10   10,65     11,57     12,15     12,64      12,76   12,87       13,11   13,12    13,30     13,42
                                                  Coverage break 0.001
               5        10        15        20        25       30         35      40           45    50
        2    0,65
        3    1,64      1,83      1,53
        4    3,16      3,51      3,60      3,65       3,83    3,91        3,91
        5    4,31      4,79      5,31      5,53       5,87    5,91        5,92    6,01     5,95      6,18
        6    5,27      6,07      6,73      7,12       7,11    7,35        7,49    7,70     7,73      7,81
        7    5,96      7,10      7,58      7,80       8,30    8,45        8,69    8,95     9,09      9,19
        8    7,04      8,07      8,65      8,97       9,12    9,41        9,67    9,90    10,02     10,21
        9    9,18     10,10     10,39     10,70      11,00   11,11       11,19   11,43    11,51     11,64
       10   10,11     11,22               11,71


   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham     Analysing BioHEL Using Boolean Functions       14 / 27
BioHEL GBML System
                                                            Experiment Setup
                          k-Disjuntive Normal functions
                                                            Iterations and execution time
                                          Experiments
                                                            Learning and overgeneralisation
                         Conclusions and Further Work

Number of iterations to learn a good rule

                                                  Coverage break 0.01
               5        10        15        20       25       30        35       40           45   50
        2    0,61
        3    1,48      1,68      1,54
        4    2,73      3,09      3,19      3,23      3,48    3,59       3,53
        5    3,64      4,11      4,55      4,75      5,20    5,29       5,34    5,52      5,52     5,75
        6    4,95      5,40      5,96      6,25      6,30    6,68       6,76    6,96      7,04     7,30
        7    7,53      7,88      7,88      8,02      8,35    8,46       8,56    8,78      8,88     9,07
        8
        9
       10

                                                  Coverage break 0.1
               5        10        15        20       25       30        35       40           45   50
        2    0,50
        3    1,29      1,45      1,39
        4    3,21
        5
        6
        7
        8
        9
       10


   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham    Analysing BioHEL Using Boolean Functions      15 / 27
BioHEL GBML System
                                                            Experiment Setup
                          k-Disjuntive Normal functions
                                                            Iterations and execution time
                                          Experiments
                                                            Learning and overgeneralisation
                         Conclusions and Further Work

Which one is the best configuration?


                                                    Minimum values
               5        10        15        20       25       30        35       40           45    50
        2    0,50
        3    1,29      1,45      1,39
        4    2,73      3,09      3,19      3,23      3,48    3,59      3,53
        5    3,64      4,11      4,55      4,75      5,20    5,29      5,34      5,52     5,52      5,75
        6    4,95      5,40      5,96      6,25      6,30    6,68      6,76      6,96     7,04      7,30
        7    5,96      7,10      7,58      7,80      8,30    8,45      8,56      8,78     8,88      9,07
        8    7,04      8,07      8,65      8,97      9,12    9,41      9,67      9,90    10,02     10,21
        9    9,02     10,10     10,39     10,70     11,00   11,11     11,19     11,43    11,51     11,64
       10   10,11     11,22     12,15     11,71     12,76   12,87     13,11     13,12    13,30     13,42




                   The adequate coverage break depends on the
                         characteristics of the problem



   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham    Analysing BioHEL Using Boolean Functions       16 / 27
BioHEL GBML System
                                                                   Experiment Setup
                               k-Disjuntive Normal functions
                                                                   Iterations and execution time
                                               Experiments
                                                                   Learning and overgeneralisation
                              Conclusions and Further Work

Execution time to learn the problem


                                       Average execution time to learn the problem

                                                                                                 0.0001
       Execution time (s)                                                                         0.001
                                                                                                   0.01
                                                                                                    0.1

       14000
       12000
       10000
        8000
        6000
        4000
        2000
            0




                50     45 40                                                                        9     10
                                35 30                                                   7     8
                                         25                                        6
                     r - Number of rules      20   15                    4   k5- Number of terms in the rule
                                                        10   5 2   3




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham           Analysing BioHEL Using Boolean Functions    17 / 27
BioHEL GBML System
                                                                   Experiment Setup
                               k-Disjuntive Normal functions
                                                                   Iterations and execution time
                                               Experiments
                                                                   Learning and overgeneralisation
                              Conclusions and Further Work

Execution time to learn the problem - Majority policy


                                       Average execution time to learn the problem

                                                                                                 0.0001
       Execution time (s)                                                                         0.001
                                                                                                   0.01
                                                                                                    0.1

       60000
       50000
       40000
       30000
       20000
       10000
            0




                50     45 40                                                                        9     10
                                35 30                                                   7     8
                                         25                                        6
                     r - Number of rules      20   15                    4   k5- Number of terms in the rule
                                                        10   5 2   3




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham           Analysing BioHEL Using Boolean Functions    18 / 27
BioHEL GBML System
                                                                  Experiment Setup
                            k-Disjuntive Normal functions
                                                                  Iterations and execution time
                                            Experiments
                                                                  Learning and overgeneralisation
                           Conclusions and Further Work

Execution time to learn the problem - Majority policy


                                      Average execution time to learn the problem

                                                                                    0.0001
                                                                                     0.001
                                                                                      0.01
                                                                                       0.1
         60000
         50000
         40000
         30000
         20000
         10000
             0




             2
                 3
                     4
                          5                                                                   45    50
                              6
    k - Number of terms in the rule                                                      40
                                  7                                          30  35
                                      8                          20    25
                                          9           10    15          r - Number of rules
                                              10 5




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham          Analysing BioHEL Using Boolean Functions   19 / 27
BioHEL GBML System
                                                          Experiment Setup
                         k-Disjuntive Normal functions
                                                          Iterations and execution time
                                         Experiments
                                                          Learning and overgeneralisation
                        Conclusions and Further Work

Summary




         The execution time and the iterations are proportional to:
                 Number of rules r
                 Number of specified attributes k
         Learning with the minority policy is more similar to a real
         life scenario.
         Choosing the wrong default class might lead to learn a
         more difficult problem.




  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   20 / 27
BioHEL GBML System
                                                           Experiment Setup
                          k-Disjuntive Normal functions
                                                           Iterations and execution time
                                          Experiments
                                                           Learning and overgeneralisation
                         Conclusions and Further Work

Learning and overgeneralisation

   Learning maps
   Show different colours depending on the percentage of
   examples that learned correctly, overgeneralised and did not
   learn the correct set of rules.

          Blue: total learning ⇒ All the runs learned the right set of
          rules
          Cyan: between learning and overgeneralisation
          Purple: overgeneralisation ⇒ All the runs learned a set of
          rules with less that 100% accuracy.
          Orange: between overgeneralisation and no learning
          Red: no learning ⇒ All the runs used the default rule to
          cover all the examples. No rules were generated
   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   21 / 27
BioHEL GBML System
                                                           Experiment Setup
                          k-Disjuntive Normal functions
                                                           Iterations and execution time
                                          Experiments
                                                           Learning and overgeneralisation
                         Conclusions and Further Work

Learning and overgeneralisation

   Learning maps
   Show different colours depending on the percentage of
   examples that learned correctly, overgeneralised and did not
   learn the correct set of rules.

          Blue: total learning ⇒ All the runs learned the right set of
          rules
          Cyan: between learning and overgeneralisation
          Purple: overgeneralisation ⇒ All the runs learned a set of
          rules with less that 100% accuracy.
          Orange: between overgeneralisation and no learning
          Red: no learning ⇒ All the runs used the default rule to
          cover all the examples. No rules were generated
   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   21 / 27
BioHEL GBML System
                                                                                                                                                                                                                                                                             Experiment Setup
                                                                                                                           k-Disjuntive Normal functions
                                                                                                                                                                                                                                                                             Iterations and execution time
                                                                                                                                           Experiments
                                                                                                                                                                                                                                                                             Learning and overgeneralisation
                                                                                                                          Conclusions and Further Work

Learning and overgeneralisation - Default class 0

                                                   Map of cases - Cov. break 0,0001 - Default Class 0                                                                                                                                    Map of cases - Cov. break 0,001 - Default Class 0                                                                                                                                     Map of cases - Cov. break 0,01 - Default Class 0
                                     55                                                                                                                                                                                    55                                                                                                                                                                                   55

                                     50                                                                                                                                                                                    50                                                                                                                                                                                   50

                                     45                                                                                                                                                                                    45                                                                                                                                                                                   45
      r - Number of terms or rules




                                                                                                                                                                                            r - Number of terms or rules




                                                                                                                                                                                                                                                                                                                                                                                 r - Number of terms or rules
                                     40                                                                                                                                                                                    40                                                                                                                                                                                   40

                                     35                                                                                                                                                                                    35                                                                                                                                                                                   35

                                     30                                                                                                                                                                                    30                                                                                                                                                                                   30

                                     25                                                                                                                                                                                    25                                                                                                                                                                                   25

                                     20                                                                                                                                                                                    20                                                                                                                                                                                   20

                                     15                                                                                                                                                                                    15                                                                                                                                                                                   15

                                     10                                                                                                                                                                                    10                                                                                                                                                                                   10

                                     5                                                                                                                                                                                     5                                                                                                                                                                                    5

                                     0                                                                                                                                                                                     0                                                                                                                                                                                    0
                                          1   2     3      4       5       6       7                             8                 9       10       11                                                                          1    2    3      4       5       6       7                            8                 9       10       11                                                                          1    2    3      4       5       6       7       8       9   10   11
                                                                k - Attributes expressed                                                                                                                                                             k - Attributes expressed                                                                                                                                                              k - Attributes expressed




    (a) Cov. Break 0,0001                                                                                                                                                               (b) Cov. Break 0,001                                                                                                                                                                     (c) Cov. Break 0,01
                                                                                                                                                Map of cases - Cov. break 0,1 - Default Class 0                                                                                                                                      Map of cases - Cov. break 0,5 - Default Class 0
                                                                                                                          55                                                                                                                                                                                   55

                                                                                                                          50                                                                                                                                                                                   50

                                                                                                                          45                                                                                                                                                                                   45
                                                                                           r - Number of terms or rules




                                                                                                                                                                                                                                                                                r - Number of terms or rules
                                                                                                                          40                                                                                                                                                                                   40

                                                                                                                          35                                                                                                                                                                                   35

                                                                                                                          30                                                                                                                                                                                   30

                                                                                                                          25                                                                                                                                                                                   25

                                                                                                                          20                                                                                                                                                                                   20

                                                                                                                          15                                                                                                                                                                                   15

                                                                                                                          10                                                                                                                                                                                   10

                                                                                                                           5                                                                                                                                                                                   5

                                                                                                                           0                                                                                                                                                                                   0
                                                                                                                               1       2        3        4       5        6       7     8                                       9   10   11                                                                         1       2        3        4       5        6       7     8                                       9   10   11
                                                                                                                                                             k - Attributes expressed                                                                                                                                                             k - Attributes expressed




                                                                                                                      (d) Cov. Break 0,1                                                                                                                                                                   (e) Cov. Break 0,5

                                                  Blue: total learning, Cyan: between learning and overgeneralisation, Purple: overgeneralisation,
                                                                Orange: between overgeneralisation and no learning , Red: no learning

   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham                                                                                                                                                                                                                     Analysing BioHEL Using Boolean Functions                                                                                                                                                                       22 / 27
BioHEL GBML System
                                                                                                                                                                                                                                                                                 Experiment Setup
                                                                                                                            k-Disjuntive Normal functions
                                                                                                                                                                                                                                                                                 Iterations and execution time
                                                                                                                                            Experiments
                                                                                                                                                                                                                                                                                 Learning and overgeneralisation
                                                                                                                           Conclusions and Further Work

Learning and overgeneralisation - Majority policy

                                                  Map of cases - Cov. break 0,0001 - Default Class major                                                                                                                                   Map of cases - Cov. break 0,001 - Default Class major                                                                                                                                   Map of cases - Cov. break 0,01 - Default Class major
                                     55                                                                                                                                                                                      55                                                                                                                                                                                      55

                                     50                                                                                                                                                                                      50                                                                                                                                                                                      50

                                     45                                                                                                                                                                                      45                                                                                                                                                                                      45
      r - Number of terms or rules




                                                                                                                                                                                              r - Number of terms or rules




                                                                                                                                                                                                                                                                                                                                                                                      r - Number of terms or rules
                                     40                                                                                                                                                                                      40                                                                                                                                                                                      40

                                     35                                                                                                                                                                                      35                                                                                                                                                                                      35

                                     30                                                                                                                                                                                      30                                                                                                                                                                                      30

                                     25                                                                                                                                                                                      25                                                                                                                                                                                      25

                                     20                                                                                                                                                                                      20                                                                                                                                                                                      20

                                     15                                                                                                                                                                                      15                                                                                                                                                                                      15

                                     10                                                                                                                                                                                      10                                                                                                                                                                                      10

                                     5                                                                                                                                                                                       5                                                                                                                                                                                       5

                                     0                                                                                                                                                                                       0                                                                                                                                                                                       0
                                          1   2      3       4      5       6       7                             8                 9       10        11                                                                          1    2     3       4       5       6       7                            8                 9       10        11                                                                          1    2    3       4       5       6       7       8       9     10   11
                                                                 k - Attributes expressed                                                                                                                                                                k - Attributes expressed                                                                                                                                                                k - Attributes expressed




     (f) Cov. Break 0,0001                                                                                                                                                                (g) Cov. Break 0,001                                                                                                                                                                     (h) Cov. Break 0,01
                                                                                                                                                 Map of cases - Cov. break 0,1 - Default Class major                                                                                                                                     Map of cases - Cov. break 0,5 - Default Class major
                                                                                                                           55                                                                                                                                                                                      55

                                                                                                                           50                                                                                                                                                                                      50

                                                                                                                           45                                                                                                                                                                                      45
                                                                                            r - Number of terms or rules




                                                                                                                                                                                                                                                                                    r - Number of terms or rules
                                                                                                                           40                                                                                                                                                                                      40

                                                                                                                           35                                                                                                                                                                                      35

                                                                                                                           30                                                                                                                                                                                      30

                                                                                                                           25                                                                                                                                                                                      25

                                                                                                                           20                                                                                                                                                                                      20

                                                                                                                           15                                                                                                                                                                                      15

                                                                                                                           10                                                                                                                                                                                      10

                                                                                                                            5                                                                                                                                                                                      5

                                                                                                                            0                                                                                                                                                                                      0
                                                                                                                                1       2         3        4       5        6       7     8                                       9   10   11                                                                           1       2         3        4       5        6       7     8                                       9   10   11
                                                                                                                                                               k - Attributes expressed                                                                                                                                                                k - Attributes expressed




                                                                                                                           (i) Cov. Break 0,1                                                                                                                                                                      (j) Cov. Break 0,5

                                                  Blue: total learning, Cyan: between learning and overgeneralisation, Purple: overgeneralisation,
                                                                Orange: between overgeneralisation and no learning , Red: no learning

   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham                                                                                                                                                                                                                         Analysing BioHEL Using Boolean Functions                                                                                                                                                                           23 / 27
BioHEL GBML System
                                                          Experiment Setup
                         k-Disjuntive Normal functions
                                                          Iterations and execution time
                                         Experiments
                                                          Learning and overgeneralisation
                        Conclusions and Further Work

Summary



         The coverage break should be large enough to introduce
         generalisation pressure over the system but low enough to
         avoid overgeneral rules.
         The adequate coverage break depends on k and also
         depends on r .
         The problems where the rules that cover wider areas are
         more difficult to learn even with the right coverage break.
         The difficulty of a k-DNF problem depends on the class
         imbalance and the rule overlapping.




  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   24 / 27
BioHEL GBML System
                                                          Experiment Setup
                         k-Disjuntive Normal functions
                                                          Iterations and execution time
                                         Experiments
                                                          Learning and overgeneralisation
                        Conclusions and Further Work

Summary



         The coverage break should be large enough to introduce
         generalisation pressure over the system but low enough to
         avoid overgeneral rules.
         The adequate coverage break depends on k and also
         depends on r .
         The problems where the rules that cover wider areas are
         more difficult to learn even with the right coverage break.
         The difficulty of a k-DNF problem depends on the class
         imbalance and the rule overlapping.




  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   24 / 27
BioHEL GBML System
                                                          Experiment Setup
                         k-Disjuntive Normal functions
                                                          Iterations and execution time
                                         Experiments
                                                          Learning and overgeneralisation
                        Conclusions and Further Work

Summary



         The coverage break should be large enough to introduce
         generalisation pressure over the system but low enough to
         avoid overgeneral rules.
         The adequate coverage break depends on k and also
         depends on r .
         The problems where the rules that cover wider areas are
         more difficult to learn even with the right coverage break.
         The difficulty of a k-DNF problem depends on the class
         imbalance and the rule overlapping.




  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   24 / 27
BioHEL GBML System
                                                          Experiment Setup
                         k-Disjuntive Normal functions
                                                          Iterations and execution time
                                         Experiments
                                                          Learning and overgeneralisation
                        Conclusions and Further Work

Summary



         The coverage break should be large enough to introduce
         generalisation pressure over the system but low enough to
         avoid overgeneral rules.
         The adequate coverage break depends on k and also
         depends on r .
         The problems where the rules that cover wider areas are
         more difficult to learn even with the right coverage break.
         The difficulty of a k-DNF problem depends on the class
         imbalance and the rule overlapping.




  M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   24 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

Conclusions



          There is no coverage break that works with all type of
          problems ⇒ No Free Lunch
          The adequate coverage break facilitates the learning
          while the wrong coverage break makes it harder or even
          impossible.
  Open questions
  Would it be possible to adapt the coverage break automatically
  and reduce the cost of hand tuning the parameters?




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   25 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

Conclusions



          There is no coverage break that works with all type of
          problems ⇒ No Free Lunch
          The adequate coverage break facilitates the learning
          while the wrong coverage break makes it harder or even
          impossible.
  Open questions
  Would it be possible to adapt the coverage break automatically
  and reduce the cost of hand tuning the parameters?




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   25 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

Conclusions



          There is no coverage break that works with all type of
          problems ⇒ No Free Lunch
          The adequate coverage break facilitates the learning
          while the wrong coverage break makes it harder or even
          impossible.
  Open questions
  Would it be possible to adapt the coverage break automatically
  and reduce the cost of hand tuning the parameters?




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   25 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

Further Work




          Incorporate a heuristic inside BioHEL to determine a good
          coverage break for the problem and readapt this coverage
          break during the learning process
          Analyse the learning map of other evolutionary learning
          systems to determine strengths and weaknesses of the
          systems.
          Encourage the usage of the kDNF family of problems as a
          common benchmark in the LCS community




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   26 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

Further Work




          Incorporate a heuristic inside BioHEL to determine a good
          coverage break for the problem and readapt this coverage
          break during the learning process
          Analyse the learning map of other evolutionary learning
          systems to determine strengths and weaknesses of the
          systems.
          Encourage the usage of the kDNF family of problems as a
          common benchmark in the LCS community




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   26 / 27
BioHEL GBML System
                          k-Disjuntive Normal functions
                                          Experiments
                         Conclusions and Further Work

Further Work




          Incorporate a heuristic inside BioHEL to determine a good
          coverage break for the problem and readapt this coverage
          break during the learning process
          Analyse the learning map of other evolutionary learning
          systems to determine strengths and weaknesses of the
          systems.
          Encourage the usage of the kDNF family of problems as a
          common benchmark in the LCS community




   M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   26 / 27
BioHEL GBML System
                        k-Disjuntive Normal functions
                                        Experiments
                       Conclusions and Further Work

      Bacardit, J. (2004).
      Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and
      run-time.
      PhD thesis, Ramon Llull University, Barcelona, Spain.

      Bacardit, J., Burke, E., and Krasnogor, N. (2009).
      Improving the scalability of rule-based evolutionary learning.
      Memetic Computing, 1(1):55–67.

      Stout, M., Bacardit, J., Hirst, J. D., and Krasnogor, N. (2008).
      Prediction of recursive convex hull class assignments for protein residues.
      Bioinformatics, 24(7):916–923.

      Venturini, G. (1993).
      SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.
      In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine
      Learning, pages 280–296. Springer-Verlag.




M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham        Analysing BioHEL Using Boolean Functions            27 / 27
BioHEL GBML System
                       k-Disjuntive Normal functions
                                       Experiments
                      Conclusions and Further Work




                    Questions or comments?




M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham   Analysing BioHEL Using Boolean Functions   27 / 27

Mais conteúdo relacionado

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Analysing BioHEL Using Challenging Boolean Functions

  • 1. BioHEL GBML System right-logo k-Disjuntive Normal functions Experiments Conclusions and Further Work Analysing BioHEL Using Challenging Boolean Functions María A. Franco, Natalio Krasnogor and Jaume Bacardit University of Nottingham, UK, ASAP Research Group, School of Computer Science {mxf,nxk,jqb}@cs.nott.ac.uk July 8, 2010 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 1 / 27
  • 2. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work 1 BioHEL GBML System Characteristics of the system BioHEL fitness function Open questions for BioHEL 2 k-Disjuntive Normal functions 3 Experiments Experiment Setup Iterations and execution time Learning and overgeneralisation 4 Conclusions and Further Work M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 2 / 27
  • 3. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work The BioHEL GBML System BIOinformatics-oriented Hierarchical Evolutionary Learning - BioHEL[Bacardit et al., 2009] BioHEL was designed to handle large scale bioinformatics datasets[Stout et al., 2008] BioHEL is a GBML system that employs the Iterative Rule Learning (IRL) paradigm First used in EC in Venturini’s SIA system[Venturini, 1993] Widely used for both Fuzzy and non-fuzzy evolutionary learning BioHEL inherits most of its components from GAssist[Bacardit, 2004], a Pittsburgh GBML system M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27
  • 4. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work The BioHEL GBML System BIOinformatics-oriented Hierarchical Evolutionary Learning - BioHEL[Bacardit et al., 2009] BioHEL was designed to handle large scale bioinformatics datasets[Stout et al., 2008] BioHEL is a GBML system that employs the Iterative Rule Learning (IRL) paradigm First used in EC in Venturini’s SIA system[Venturini, 1993] Widely used for both Fuzzy and non-fuzzy evolutionary learning BioHEL inherits most of its components from GAssist[Bacardit, 2004], a Pittsburgh GBML system M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27
  • 5. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work The BioHEL GBML System BIOinformatics-oriented Hierarchical Evolutionary Learning - BioHEL[Bacardit et al., 2009] BioHEL was designed to handle large scale bioinformatics datasets[Stout et al., 2008] BioHEL is a GBML system that employs the Iterative Rule Learning (IRL) paradigm First used in EC in Venturini’s SIA system[Venturini, 1993] Widely used for both Fuzzy and non-fuzzy evolutionary learning BioHEL inherits most of its components from GAssist[Bacardit, 2004], a Pittsburgh GBML system M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27
  • 6. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work The BioHEL GBML System BIOinformatics-oriented Hierarchical Evolutionary Learning - BioHEL[Bacardit et al., 2009] BioHEL was designed to handle large scale bioinformatics datasets[Stout et al., 2008] BioHEL is a GBML system that employs the Iterative Rule Learning (IRL) paradigm First used in EC in Venturini’s SIA system[Venturini, 1993] Widely used for both Fuzzy and non-fuzzy evolutionary learning BioHEL inherits most of its components from GAssist[Bacardit, 2004], a Pittsburgh GBML system M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27
  • 7. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Iterative Rule Learning IRL has been used for many years in the ML community, with the name of separate-and-conquer Algorithm 1.1: I TERATIVE RULE L EARNING(Examples) Theory ← ∅ whileExample = ∅ Rule ← FindBestRule(Examples)  Covered ← Cover (Rule, Examples)    if RuleStoppingCriterion(Rule, Theory , Examples)  do  then exit  Examples ← Examples − Covered    Theory ← Theory ∪ Rule  return (Theory ) M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 4 / 27
  • 8. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Characteristics of BioHEL A fitness function based on the Minimum-Description-Length (MDL) (Rissanen,1978) principle that tries to Evolve accurate rules Evolve high coverage rules Evolve rules with low complexity, as general as possible The ILAS windowing scheme Efficiency enhancement method, not all training points are used for each fitness computation M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 5 / 27
  • 9. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Characteristics of BioHEL A fitness function based on the Minimum-Description-Length (MDL) (Rissanen,1978) principle that tries to Evolve accurate rules Evolve high coverage rules Evolve rules with low complexity, as general as possible The ILAS windowing scheme Efficiency enhancement method, not all training points are used for each fitness computation M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 5 / 27
  • 10. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Characteristics of BioHEL The Attribute List Knowledge representation Representation designed to handle high-dimensionality domains An explicit default rule mechanism Generating more compact rule sets Ensembles for consensus prediction Easy system to boost robustness M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 6 / 27
  • 11. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Characteristics of BioHEL The Attribute List Knowledge representation Representation designed to handle high-dimensionality domains An explicit default rule mechanism Generating more compact rule sets Ensembles for consensus prediction Easy system to boost robustness M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 6 / 27
  • 12. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Characteristics of BioHEL The Attribute List Knowledge representation Representation designed to handle high-dimensionality domains An explicit default rule mechanism Generating more compact rule sets Ensembles for consensus prediction Easy system to boost robustness M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 6 / 27
  • 13. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work BioHEL fitness function Coverage term penalises rules that do not cover a minimum percentage of examples Choosing the coverage break changes the behaviour and performance of the entire system M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 7 / 27
  • 14. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Open questions for BioHEL Does a single coverage break work for the same family of problems? How difficult is to hand-tune the coverage break? What is the performance impact of the coverage break when it is not properly adjusted? M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 8 / 27
  • 15. BioHEL GBML System Characteristics of the system k-Disjuntive Normal functions BioHEL fitness function Experiments Open questions for BioHEL Conclusions and Further Work Open questions for BioHEL Does a single coverage break work for the same family of problems? How difficult is to hand-tune the coverage break? What is the performance impact of the coverage break when it is not properly adjusted? Motivation of the paper The motivation of the paper is to answer this questions. We used k-DNF problems to test exhaustively the system with problems that vary their difficulty. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 8 / 27
  • 16. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work k-Disjuntive Normal functions r disjuntive terms d possible attributes k represented attributes in each term Example d = 10, k = 3, r = 3 (¬x1 ∧ x5 ∧ x7 ) ∨ (x1 ∧ ¬x2 ∧ x8 ) ∨ (x4 ∧ ¬x5 ∧ ¬x9 ) M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 9 / 27
  • 17. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work k-Disjuntive Normal functions r disjuntive terms d possible attributes k represented attributes in each term Example d = 10, k = 3, r = 3 (¬x1 ∧ x5 ∧ x7 ) ∨ (x1 ∧ ¬x2 ∧ x8 ) ∨ (x4 ∧ ¬x5 ∧ ¬x9 ) M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 9 / 27
  • 18. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work k-DNF class imbalance Probability of having a negative example (1 - 2(-k))r 1 0.9 1 0.8 0.9 0.7 0.8 0.6 0.7 0.5 0.6 0.4 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 5 10 15 20 10 25 9 8 30 r - Number of terms 7 6 35 5 40 k - Attributes expressed 4 3 45 2 50 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 10 / 27
  • 19. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Experimental setup 90 different k-DNF scenarios d = 20 k ranging between 2 and 10 r ranging between 5 and 50 5 different coverage breaks We show results in terms of: Iterations to learn a optimal k-DNF term Number of cases where the system overgeneralised and learned. Using a fixed default class and the majority policy M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 11 / 27
  • 20. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Iterations to learn a optimal k-DNF term Number of iterations to find a good rule Model z=a*k + b*r + c*r2 + d 0.0001 0.001 0.01 0.1 14 12 10 8 6 4 2 0 -2 50 45 40 35 2 30 3 25 4 20 r - Number of rules 5 6 15 7 k - Number of terms in the rule 8 10 9 10 5 a>b>c >d M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 12 / 27
  • 21. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Number of iterations to learn a good rule Coverage break 0.0001 5 10 15 20 25 30 35 40 45 50 2 0,62 3 1,64 1,83 1,64 4 3,25 3,55 3,65 3,71 3,89 3,94 3,92 5 4,33 4,92 5,48 5,63 5,93 5,96 6,04 6,13 6,07 6,26 6 5,60 6,39 7,06 7,38 7,41 7,67 7,71 7,95 7,96 8,07 7 6,55 7,68 8,19 8,40 8,84 9,05 9,22 9,47 9,57 9,63 8 7,58 8,76 9,37 9,80 9,94 10,21 10,45 10,67 10,82 10,95 9 9,02 10,22 10,72 11,09 11,45 11,64 11,77 12,03 12,10 12,24 10 10,65 11,57 12,15 12,64 12,76 12,87 13,11 13,12 13,30 13,42 Coverage break 0.001 5 10 15 20 25 30 35 40 45 50 2 0,65 3 1,64 1,83 1,53 4 3,16 3,51 3,60 3,65 3,83 3,91 3,91 5 4,31 4,79 5,31 5,53 5,87 5,91 5,92 6,01 5,95 6,18 6 5,27 6,07 6,73 7,12 7,11 7,35 7,49 7,70 7,73 7,81 7 5,96 7,10 7,58 7,80 8,30 8,45 8,69 8,95 9,09 9,19 8 7,04 8,07 8,65 8,97 9,12 9,41 9,67 9,90 10,02 10,21 9 9,18 10,10 10,39 10,70 11,00 11,11 11,19 11,43 11,51 11,64 10 10,11 11,22 11,71 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 13 / 27
  • 22. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Number of iterations to learn a good rule Coverage break 0.0001 5 10 15 20 25 30 35 40 45 50 2 0,62 3 1,64 1,83 1,64 4 3,25 3,55 3,65 3,71 3,89 3,94 3,92 5 4,33 4,92 5,48 5,63 5,93 5,96 6,04 6,13 6,07 6,26 6 5,60 6,39 7,06 7,38 7,41 7,67 7,71 7,95 7,96 8,07 7 6,55 7,68 8,19 8,40 8,84 9,05 9,22 9,47 9,57 9,63 8 7,58 8,76 9,37 9,80 9,94 10,21 10,45 10,67 10,82 10,95 9 9,02 10,22 10,72 11,09 11,45 11,64 11,77 12,03 12,10 12,24 10 10,65 11,57 12,15 12,64 12,76 12,87 13,11 13,12 13,30 13,42 Coverage break 0.001 5 10 15 20 25 30 35 40 45 50 2 0,65 3 1,64 1,83 1,53 4 3,16 3,51 3,60 3,65 3,83 3,91 3,91 5 4,31 4,79 5,31 5,53 5,87 5,91 5,92 6,01 5,95 6,18 6 5,27 6,07 6,73 7,12 7,11 7,35 7,49 7,70 7,73 7,81 7 5,96 7,10 7,58 7,80 8,30 8,45 8,69 8,95 9,09 9,19 8 7,04 8,07 8,65 8,97 9,12 9,41 9,67 9,90 10,02 10,21 9 9,18 10,10 10,39 10,70 11,00 11,11 11,19 11,43 11,51 11,64 10 10,11 11,22 11,71 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 14 / 27
  • 23. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Number of iterations to learn a good rule Coverage break 0.01 5 10 15 20 25 30 35 40 45 50 2 0,61 3 1,48 1,68 1,54 4 2,73 3,09 3,19 3,23 3,48 3,59 3,53 5 3,64 4,11 4,55 4,75 5,20 5,29 5,34 5,52 5,52 5,75 6 4,95 5,40 5,96 6,25 6,30 6,68 6,76 6,96 7,04 7,30 7 7,53 7,88 7,88 8,02 8,35 8,46 8,56 8,78 8,88 9,07 8 9 10 Coverage break 0.1 5 10 15 20 25 30 35 40 45 50 2 0,50 3 1,29 1,45 1,39 4 3,21 5 6 7 8 9 10 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 15 / 27
  • 24. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Which one is the best configuration? Minimum values 5 10 15 20 25 30 35 40 45 50 2 0,50 3 1,29 1,45 1,39 4 2,73 3,09 3,19 3,23 3,48 3,59 3,53 5 3,64 4,11 4,55 4,75 5,20 5,29 5,34 5,52 5,52 5,75 6 4,95 5,40 5,96 6,25 6,30 6,68 6,76 6,96 7,04 7,30 7 5,96 7,10 7,58 7,80 8,30 8,45 8,56 8,78 8,88 9,07 8 7,04 8,07 8,65 8,97 9,12 9,41 9,67 9,90 10,02 10,21 9 9,02 10,10 10,39 10,70 11,00 11,11 11,19 11,43 11,51 11,64 10 10,11 11,22 12,15 11,71 12,76 12,87 13,11 13,12 13,30 13,42 The adequate coverage break depends on the characteristics of the problem M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 16 / 27
  • 25. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Execution time to learn the problem Average execution time to learn the problem 0.0001 Execution time (s) 0.001 0.01 0.1 14000 12000 10000 8000 6000 4000 2000 0 50 45 40 9 10 35 30 7 8 25 6 r - Number of rules 20 15 4 k5- Number of terms in the rule 10 5 2 3 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 17 / 27
  • 26. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Execution time to learn the problem - Majority policy Average execution time to learn the problem 0.0001 Execution time (s) 0.001 0.01 0.1 60000 50000 40000 30000 20000 10000 0 50 45 40 9 10 35 30 7 8 25 6 r - Number of rules 20 15 4 k5- Number of terms in the rule 10 5 2 3 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 18 / 27
  • 27. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Execution time to learn the problem - Majority policy Average execution time to learn the problem 0.0001 0.001 0.01 0.1 60000 50000 40000 30000 20000 10000 0 2 3 4 5 45 50 6 k - Number of terms in the rule 40 7 30 35 8 20 25 9 10 15 r - Number of rules 10 5 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 19 / 27
  • 28. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Summary The execution time and the iterations are proportional to: Number of rules r Number of specified attributes k Learning with the minority policy is more similar to a real life scenario. Choosing the wrong default class might lead to learn a more difficult problem. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 20 / 27
  • 29. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Learning and overgeneralisation Learning maps Show different colours depending on the percentage of examples that learned correctly, overgeneralised and did not learn the correct set of rules. Blue: total learning ⇒ All the runs learned the right set of rules Cyan: between learning and overgeneralisation Purple: overgeneralisation ⇒ All the runs learned a set of rules with less that 100% accuracy. Orange: between overgeneralisation and no learning Red: no learning ⇒ All the runs used the default rule to cover all the examples. No rules were generated M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 21 / 27
  • 30. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Learning and overgeneralisation Learning maps Show different colours depending on the percentage of examples that learned correctly, overgeneralised and did not learn the correct set of rules. Blue: total learning ⇒ All the runs learned the right set of rules Cyan: between learning and overgeneralisation Purple: overgeneralisation ⇒ All the runs learned a set of rules with less that 100% accuracy. Orange: between overgeneralisation and no learning Red: no learning ⇒ All the runs used the default rule to cover all the examples. No rules were generated M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 21 / 27
  • 31. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Learning and overgeneralisation - Default class 0 Map of cases - Cov. break 0,0001 - Default Class 0 Map of cases - Cov. break 0,001 - Default Class 0 Map of cases - Cov. break 0,01 - Default Class 0 55 55 55 50 50 50 45 45 45 r - Number of terms or rules r - Number of terms or rules r - Number of terms or rules 40 40 40 35 35 35 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 k - Attributes expressed k - Attributes expressed k - Attributes expressed (a) Cov. Break 0,0001 (b) Cov. Break 0,001 (c) Cov. Break 0,01 Map of cases - Cov. break 0,1 - Default Class 0 Map of cases - Cov. break 0,5 - Default Class 0 55 55 50 50 45 45 r - Number of terms or rules r - Number of terms or rules 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 k - Attributes expressed k - Attributes expressed (d) Cov. Break 0,1 (e) Cov. Break 0,5 Blue: total learning, Cyan: between learning and overgeneralisation, Purple: overgeneralisation, Orange: between overgeneralisation and no learning , Red: no learning M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 22 / 27
  • 32. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Learning and overgeneralisation - Majority policy Map of cases - Cov. break 0,0001 - Default Class major Map of cases - Cov. break 0,001 - Default Class major Map of cases - Cov. break 0,01 - Default Class major 55 55 55 50 50 50 45 45 45 r - Number of terms or rules r - Number of terms or rules r - Number of terms or rules 40 40 40 35 35 35 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 k - Attributes expressed k - Attributes expressed k - Attributes expressed (f) Cov. Break 0,0001 (g) Cov. Break 0,001 (h) Cov. Break 0,01 Map of cases - Cov. break 0,1 - Default Class major Map of cases - Cov. break 0,5 - Default Class major 55 55 50 50 45 45 r - Number of terms or rules r - Number of terms or rules 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 k - Attributes expressed k - Attributes expressed (i) Cov. Break 0,1 (j) Cov. Break 0,5 Blue: total learning, Cyan: between learning and overgeneralisation, Purple: overgeneralisation, Orange: between overgeneralisation and no learning , Red: no learning M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 23 / 27
  • 33. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Summary The coverage break should be large enough to introduce generalisation pressure over the system but low enough to avoid overgeneral rules. The adequate coverage break depends on k and also depends on r . The problems where the rules that cover wider areas are more difficult to learn even with the right coverage break. The difficulty of a k-DNF problem depends on the class imbalance and the rule overlapping. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27
  • 34. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Summary The coverage break should be large enough to introduce generalisation pressure over the system but low enough to avoid overgeneral rules. The adequate coverage break depends on k and also depends on r . The problems where the rules that cover wider areas are more difficult to learn even with the right coverage break. The difficulty of a k-DNF problem depends on the class imbalance and the rule overlapping. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27
  • 35. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Summary The coverage break should be large enough to introduce generalisation pressure over the system but low enough to avoid overgeneral rules. The adequate coverage break depends on k and also depends on r . The problems where the rules that cover wider areas are more difficult to learn even with the right coverage break. The difficulty of a k-DNF problem depends on the class imbalance and the rule overlapping. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27
  • 36. BioHEL GBML System Experiment Setup k-Disjuntive Normal functions Iterations and execution time Experiments Learning and overgeneralisation Conclusions and Further Work Summary The coverage break should be large enough to introduce generalisation pressure over the system but low enough to avoid overgeneral rules. The adequate coverage break depends on k and also depends on r . The problems where the rules that cover wider areas are more difficult to learn even with the right coverage break. The difficulty of a k-DNF problem depends on the class imbalance and the rule overlapping. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27
  • 37. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Conclusions There is no coverage break that works with all type of problems ⇒ No Free Lunch The adequate coverage break facilitates the learning while the wrong coverage break makes it harder or even impossible. Open questions Would it be possible to adapt the coverage break automatically and reduce the cost of hand tuning the parameters? M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 25 / 27
  • 38. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Conclusions There is no coverage break that works with all type of problems ⇒ No Free Lunch The adequate coverage break facilitates the learning while the wrong coverage break makes it harder or even impossible. Open questions Would it be possible to adapt the coverage break automatically and reduce the cost of hand tuning the parameters? M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 25 / 27
  • 39. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Conclusions There is no coverage break that works with all type of problems ⇒ No Free Lunch The adequate coverage break facilitates the learning while the wrong coverage break makes it harder or even impossible. Open questions Would it be possible to adapt the coverage break automatically and reduce the cost of hand tuning the parameters? M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 25 / 27
  • 40. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Further Work Incorporate a heuristic inside BioHEL to determine a good coverage break for the problem and readapt this coverage break during the learning process Analyse the learning map of other evolutionary learning systems to determine strengths and weaknesses of the systems. Encourage the usage of the kDNF family of problems as a common benchmark in the LCS community M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 26 / 27
  • 41. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Further Work Incorporate a heuristic inside BioHEL to determine a good coverage break for the problem and readapt this coverage break during the learning process Analyse the learning map of other evolutionary learning systems to determine strengths and weaknesses of the systems. Encourage the usage of the kDNF family of problems as a common benchmark in the LCS community M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 26 / 27
  • 42. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Further Work Incorporate a heuristic inside BioHEL to determine a good coverage break for the problem and readapt this coverage break during the learning process Analyse the learning map of other evolutionary learning systems to determine strengths and weaknesses of the systems. Encourage the usage of the kDNF family of problems as a common benchmark in the LCS community M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 26 / 27
  • 43. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Bacardit, J. (2004). Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain. Bacardit, J., Burke, E., and Krasnogor, N. (2009). Improving the scalability of rule-based evolutionary learning. Memetic Computing, 1(1):55–67. Stout, M., Bacardit, J., Hirst, J. D., and Krasnogor, N. (2008). Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 24(7):916–923. Venturini, G. (1993). SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning, pages 280–296. Springer-Verlag. M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 27 / 27
  • 44. BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Questions or comments? M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 27 / 27