SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Learning Classifier Systems
   for Class Imbalance
         Problems

           Ester Bernadó-Mansilla

    Research Group in Intelligent Systems
      Enginyeria i Arquitectura La Salle
          Universitat Ramon Llull
             Barcelona, Spain
Aim




                         Enhance the applicability of LCSs
                       to knowledge discovery from datasets

                                        Classification problems
                                         Real-world domains




Learning Classifier Systems for Class Imbalance Problems          Ester Bernadó-Mansilla
Framework




                                                                  model
                                                      LCS
              Dataset
                                                                    +
                                                               estimated
                                                              performance
       • Representativity of the target
                                                            • Evolutionary pressures
       concept

                                                            • Interpretability
       • Geometrical complexity

                                                            • Domain of applicability
       • Class imbalance

       • Noise
Learning Classifier Systems for Class Imbalance Problems                  Ester Bernadó-Mansilla
Class Imbalance


     When one class is represented by a small number of

     examples, compared to other class/es.

     Usually the class of that describes the circumscribed

     concept (positive class) is the minority class

     Where?

            Rare medical diagnoses
       
            Fraud detection
       
            Oil spills in satellite images
       




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Class Imbalance and Classifiers


     Is there a bias towards the majority class?




     Probably, because…

            Most classifier schemes are trained to minimize the global error
       



     As a result

            They classify accurately the examples from the majority class
       
            They tend to misclassify the examples of the minority class,
       
            which are often those representing the target concept.




Learning Classifier Systems for Class Imbalance Problems        Ester Bernadó-Mansilla
Measures of Performance


                                                       Confusion matrix

                                                               Prediction
                                                           A                   B
          Actual            A              true positive (TP)         false negative (FN)
                            B              false positive (FP)         true negative (TN)




                                Accuracy = (TP+TN)/(TP+FN+FP+TN)

                                TN rate = TN / (TN + FP)

                                TP rate = TP / (FN + TP)

                                ROC curves

Learning Classifier Systems for Class Imbalance Problems                             Ester Bernadó-Mansilla
The Higher Class Imbalance: the
                    Higher Bias?

                          Dataset 1                          Dataset 2

                       concept:      15                    concept:     15
                       counterpart: 150                    counterpart: 45
                       ratio:       10:1                   ratio:       3:1




Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
XCS

                                                                           XCS

                                                                                     class
                      input                                Set of
                                                           Rules

                                                                         update
                                      search

                                          Genetic                   Reinforcement
                                         Algorithms                   Learning


                                                                            reward

                                                           Environment




                                                             Dataset




Learning Classifier Systems for Class Imbalance Problems                                     Ester Bernadó-Mansilla
Our Approach with XCS


     Bounding XCS’s parameters for unbalanced datasets




     Online identification of small disjuncts




     Adaptation of parameters for the discovery of small

     disjuncts




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
XCS’s Behavior in Unbalanced
                     Datasets

                       Unbalanced 11-multiplexer problem




                                                    ir=16:1   ir=32:1   ir=64:1


Learning Classifier Systems for Class Imbalance Problems                          Ester Bernadó-Mansilla
XCS’s Population



                                        Most numerous rules, ir=128:1


            Classifier                    P                Error         F          Num
         ###########:0                  1000                0.12       0.98         385
                                       1.2 10-4
         ###########:1                                     0.074       0.98         366




                                        estimated          estimated                   too high
                                                                            high
                                        prediction:
          overgeneral                                        error:                   numerosity
                                                                          fitness
                                          992.24
          classifiers                                        15.38
                                           7.75

          Test examples are classified as belonging to the majority class



Learning Classifier Systems for Class Imbalance Problems                                  Ester Bernadó-Mansilla
How Imbalance Affects XCS


         Classifier’s error
   



         Stability of prediction and error estimates
   



         Occurrence-based reproduction
   




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Classifier’s Error in Unbalanced
                     Datasets

        Will an overgeneral classifier be detected as inaccurate if the
  
        imbalance ratio is high?

         Bound for inaccurate classifier:                       !quot;!0
          Given the estimated prediction and error:

                                          P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin
                                          quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
          We derive:

                     # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0
          where                                                                                  !quot;!0
                               p =!C / C
          For
                        Rmax = 1000 !0 = 1
          we get maximum imbalance ratio:

                        irmax = 1998
Learning Classifier Systems for Class Imbalance Problems                                            Ester Bernadó-Mansilla
Prediction and Error Estimates and
                     Learning Rate
                 ir=128:1, ###########:0
                                                            Error
                                       Prediction
       β=0.2
       β=0.002




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Occurrence-based Reproduction


        Probability of occurrence (pocc)

        Given ir=maj/min:
                                                                                          0,6

           Classifier            poccB            poccI
                                                                                          0,5
                                  1/2             1/2
       ########### :0




                                                              probability of occurrence
                                  1/2             1/2
       ########### :1                                                                     0,4



                                                                                          0,3
       0000#######:0             1/32
                                                                                          0,2


       0001#######:1             1/32                                                     0,1



                                                                                           0
                                                                                                1    2    4   8    16   32   64   128   256

                                                                                                             imbalance ratio
                                               22ir
                                                     p occB                                     00001######:1             00000######:0
                                              ir + 1                                            ###########:0             ###########:1




Learning Classifier Systems for Class Imbalance Problems                                                                     Ester Bernadó-Mansilla
Occurrence-based Reproduction


      Probability of reproduction (pGA)
 

                                1
                      pGA =
                               TGA
                                                  if Tocc < % GA
                                  #% GA
                      where TGA $ quot;
                                  !Tocc           otherwise

      With θGA=20:
 
                                                                            GA
                                                                   Tocc
                                                                                 …
        T (# # # # # # # # # # #: 0) ! quot;
         GA                              GA
                                                                     θGA
                                                                                     Tocc
                                                                                                           GA
                                                                                       …
       T (0000# # # # # # #: 0) ! T 1
        GA                         occ
                                                                      θGA
1 Assuming   non-overlapping


 Learning Classifier Systems for Class Imbalance Problems                                   Ester Bernadó-Mansilla
Guidelines for Parameter Tuning


     Rmax and є0 determine the threshold between negligible noise and

     imbalance ratio

     β determines the size of the moving window. The window should be

     high enough to allow computing examples from both classes:
                                                                 f min
                                                       ! =k
                                                                 f maj
     θGA can counterbalance the reproduction opportunities of most frequent

     (majority) and least frequent niches (minority):

                                                                  1
                                                    ! GA = k '
                                                                 f min


Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
XCS with Parameters Tuning


                                                                     XCS with parameter tuning
              XCS with standard settings




ir=16:1           ir=32:1               ir=64:1                ir=64:1        ir=256:1



    Learning Classifier Systems for Class Imbalance Problems                       Ester Bernadó-Mansilla
XCS Tuning for Real-world Datasets


     How we can estimate the niche frequency?



            Estimate from the ratio of majority class instances and minority
       
            class instances

            Problem:
       


               • This may not be related to the distribution of niches in the feature
                 space


            Take the approach to the small disjuncts problem
       




Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
Online Identification of Small Disjuncts



     We search for regions that promote

     overgeneral classifiers

     Estimate ircl based on the classifier’s

     experience on each class:

                                        exp max
                               ircl =
                                        exp min


     Adapt β and θGA according to ircl



                                                           ircl = 20 / 4




Learning Classifier Systems for Class Imbalance Problems                   Ester Bernadó-Mansilla
Online Parameter Adaptation


                                                           ir=256:1




Learning Classifier Systems for Class Imbalance Problems       Ester Bernadó-Mansilla
What about UCS?


             Supervised XCS:
       
                   Needs less exploration
               


             Avoids XCS’s fitness dilemma
       


             More robust to parameter settings
       


             Overgeneral classifiers also tend to overcome the
       
             population
                   Their probability of occurrence depends on the imbalance ratio
               
                   Partially minimized with fitness sharing
               




Learning Classifier Systems for Class Imbalance Problems               Ester Bernadó-Mansilla
What about UCS?


                                                           ir=256:1
                                     ir=512:1




Learning Classifier Systems for Class Imbalance Problems       Ester Bernadó-Mansilla
Are LCSs more error-prone to class imbalance
                        than other classifier schemes?
                                             C4.5                            SMO                      XCS
TP rate          Bal2c1              0,00%   ±             0,00%    0,00%     ±    0,00%     0,00%     ±         0,00%

                 Bal2c2             81,65%   ±             6,83%    93,72%    ±    4,64%     81,96%    ±         6,00%

                 Bal2c3             81,90%   ±             6,04%    93,77%    ±    5,59%     83,99%    ±         6,88%

                 bpa                42,95%   ±         14,09%       0,00%     ±    0,00%     61,38%    ±         9,10%

                 gls2c1             80,00%   ±         42,16%       0,00%     ±    0,00%     50,00%    ±        52,70%

                 gls2c2             35,00%   ±         47,43%       15,00%    ±    33,75%    55,00%    ±        49,72%

                 gls2c3             30,00%   ±         42,16%       0,00%     ±    0,00%     5,00%     ±        15,81%

                 gls2c4             75,00%   ±         32,63%       81,67%    ±    25,40%    81,67%    ±        25,40%

                 gls2c5             77,14%   ±         16,77%       10,00%    ±    9,64%     84,29%    ±        14,21%

                 gls2c6             59,82%   ±         15,13%       0,00%     ±    0,00%     81,79%    ±        13,95%

                 h-s                75,83%   ±         13,29%       80,00%    ±    7,03%     80,00%    ±         9,78%

                 pim                55,37%   ±         13,27%       53,38%    ±    6,42%     55,93%    ±         9,75%

                 tao                95,23%   ±             2,14%    84,11%    ±    6,17%     92,58%    ±         5,72%

                 thy2c1             90,00%   ±         16,10%       76,67%    ±    22,50%    90,00%    ±        16,10%

                 thy2c2             94,17%   ±         12,45%       54,17%    ±    24,92%    90,83%    ±        14,93%

                 thy2c3             90,95%   ±         10,34%       33,81%    ±    21,35%    90,71%    ±         8,05%

                 wav2c1             75,74%   ±             4,06%    88,51%    ±    3,20%     87,24%    ±         3,43%

                 wav2c2             72,34%   ±             3,89%    84,57%    ±    4,05%     78,72%    ±         2,57%

                 wab2c3             77,64%   ±             2,38%    89,97%    ±    3,48%     87,86%    ±         3,65%

                 wbdc               92,95%   ±             3,42%    95,42%    ±    5,36%     95,83%    ±         5,89%

                 wdbc               92,47%   ±             5,09%    94,81%    ±    2,71%     93,83%    ±         6,37%

                 wine2c1            89,00%   ±         16,63%      100,00%    ±    0,00%    100,00%    ±         0,00%

                 wine2c2            95,00%   ±             8,05%    98,33%    ±    5,27%     98,33%    ±         5,27%

                 wine2c3            90,18%   ±         11,70%       97,14%    ±    6,02%     98,57%    ±         4,52%
Learning Classifier Systems for Class Imbalance Problems                                               Ester Bernadó-Mansilla
                 wpbc               41,00%   ±         12,87%       9,50%     ±    17,07%    30,50%    ±        24,99%
How can we Minimize the Effects of
                     Small Disjuncts?


     Resampling the dataset:
                                                          Addresses small
                                                           disjuncts
            Classical methods:
       
               • Random oversampling
               • Random undersampling                          Assumes that
                                                               clusterization will
            Heuristic methods:
       
                                                               find small
               •   Tomek links
                                                               disjuncts and
               •   CNN                                         match classifier’s
               •   One-sided selection                         approximation
               •   Smote
            Cluster-based oversampling
                                                                 Could XCS
                                                                  benefit from the
                                                                  online
     Cost-sensitive classifiers

                                                                  identification of
                                                                  small disjuncts?
Learning Classifier Systems for Class Imbalance Problems               Ester Bernadó-Mansilla
Domains of Applicability


                   Should we use some counterbalancing scheme?
             


                   Which learning scheme should we use?
             


                   Is there a combination of counterbalancing
             
                   scheme+learner that beats all others?

                   How can we know the presence of small
             
                   disjuncts?

                   Are there other complexity factors mixed up with
             
                   the small disjuncts problem?
Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Domains of Applicability


                                                           Resampling/
                                              Learn it!    Classifier/
                                                           Resampling+classifier
                                                                                     Where are
                                                                                       LCSs
                                                                                      placed?

          Dataset                  Dataset
                                                                             Suggested
                                                            Prediction
                               characterization
                                                                             approach


                                                Type of dataset:
                                       Geometrical distribution of classes
                                      Possible presence of small disjuncts
                                           Other complexity factors


Learning Classifier Systems for Class Imbalance Problems                       Ester Bernadó-Mansilla
Future Directions


                   Potential benefit of XCS to discover small disjuncts
             
                     …and learn from it online

                   Further analyze UCS
             


                   How do LCSs perform w.r.t. other classifiers for unbalanced
             
                   datasets?

                   Measures for small disjuncts identification
             
                     … and other possible complexity factors

                   What is noise and what is a small disjunct?
             


                   In which cases a LCS is applicable?
             


Learning Classifier Systems for Class Imbalance Problems            Ester Bernadó-Mansilla
Learning Classifier Systems
   for Class Imbalance
         Problems

           Ester Bernadó-Mansilla

    Research Group in Intelligent Systems
      Enginyeria i Arquitectura La Salle
          Universitat Ramon Llull
             Barcelona, Spain

Mais conteúdo relacionado

Semelhante a Learning Classifier Systems for Class Imbalance Problems

Structural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAStructural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAclima
 
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsAlbert Orriols-Puig
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniquesGiorgos Bamparopoulos
 
Whitcher Ismrm 2009
Whitcher Ismrm 2009Whitcher Ismrm 2009
Whitcher Ismrm 2009bwhitcher
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsBrian Lange
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...Chakkrit (Kla) Tantithamthavorn
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 

Semelhante a Learning Classifier Systems for Class Imbalance Problems (10)

Structural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAStructural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOA
 
5 5 10
5 5 105 5 10
5 5 10
 
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
 
Whitcher Ismrm 2009
Whitcher Ismrm 2009Whitcher Ismrm 2009
Whitcher Ismrm 2009
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 

Mais de Xavier Llorà

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewXavier Llorà
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with MeandreXavier Llorà
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsXavier Llorà
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...Xavier Llorà
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...Xavier Llorà
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challengesXavier Llorà
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionXavier Llorà
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier SystemsXavier Llorà
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?Xavier Llorà
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableXavier Llorà
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsXavier Llorà
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring LanguageXavier Llorà
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...Xavier Llorà
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
 
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Xavier Llorà
 

Mais de Xavier Llorà (20)

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha Preview
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with Meandre
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challenges
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly Detection
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier Systems
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?
 
NIGEL 2006 welcome
NIGEL 2006 welcomeNIGEL 2006 welcome
NIGEL 2006 welcome
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring Language
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
 
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Learning Classifier Systems for Class Imbalance Problems

  • 1. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
  • 2. Aim Enhance the applicability of LCSs to knowledge discovery from datasets Classification problems Real-world domains Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 3. Framework model LCS Dataset + estimated performance • Representativity of the target • Evolutionary pressures concept • Interpretability • Geometrical complexity • Domain of applicability • Class imbalance • Noise Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 4. Class Imbalance When one class is represented by a small number of  examples, compared to other class/es. Usually the class of that describes the circumscribed  concept (positive class) is the minority class Where?  Rare medical diagnoses  Fraud detection  Oil spills in satellite images  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 5. Class Imbalance and Classifiers Is there a bias towards the majority class?  Probably, because…  Most classifier schemes are trained to minimize the global error  As a result  They classify accurately the examples from the majority class  They tend to misclassify the examples of the minority class,  which are often those representing the target concept. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 6. Measures of Performance Confusion matrix Prediction A B Actual A true positive (TP) false negative (FN) B false positive (FP) true negative (TN) Accuracy = (TP+TN)/(TP+FN+FP+TN) TN rate = TN / (TN + FP) TP rate = TP / (FN + TP) ROC curves Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 7. The Higher Class Imbalance: the Higher Bias? Dataset 1 Dataset 2 concept: 15 concept: 15 counterpart: 150 counterpart: 45 ratio: 10:1 ratio: 3:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 8. XCS XCS class input Set of Rules update search Genetic Reinforcement Algorithms Learning reward Environment Dataset Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 9. Our Approach with XCS Bounding XCS’s parameters for unbalanced datasets  Online identification of small disjuncts  Adaptation of parameters for the discovery of small  disjuncts Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 10. XCS’s Behavior in Unbalanced Datasets Unbalanced 11-multiplexer problem ir=16:1 ir=32:1 ir=64:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 11. XCS’s Population Most numerous rules, ir=128:1 Classifier P Error F Num ###########:0 1000 0.12 0.98 385 1.2 10-4 ###########:1 0.074 0.98 366 estimated estimated too high high prediction: overgeneral error: numerosity fitness 992.24 classifiers 15.38 7.75 Test examples are classified as belonging to the majority class Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 12. How Imbalance Affects XCS Classifier’s error  Stability of prediction and error estimates  Occurrence-based reproduction  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 13. Classifier’s Error in Unbalanced Datasets Will an overgeneral classifier be detected as inaccurate if the  imbalance ratio is high? Bound for inaccurate classifier: !quot;!0 Given the estimated prediction and error: P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl )) We derive: # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0 where !quot;!0 p =!C / C For Rmax = 1000 !0 = 1 we get maximum imbalance ratio: irmax = 1998 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 14. Prediction and Error Estimates and Learning Rate ir=128:1, ###########:0 Error Prediction β=0.2 β=0.002 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 15. Occurrence-based Reproduction Probability of occurrence (pocc) Given ir=maj/min: 0,6 Classifier poccB poccI 0,5 1/2 1/2 ########### :0 probability of occurrence 1/2 1/2 ########### :1 0,4 0,3 0000#######:0 1/32 0,2 0001#######:1 1/32 0,1 0 1 2 4 8 16 32 64 128 256 imbalance ratio 22ir p occB 00001######:1 00000######:0 ir + 1 ###########:0 ###########:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 16. Occurrence-based Reproduction Probability of reproduction (pGA)  1 pGA = TGA if Tocc < % GA #% GA where TGA $ quot; !Tocc otherwise With θGA=20:  GA Tocc … T (# # # # # # # # # # #: 0) ! quot; GA GA θGA Tocc GA … T (0000# # # # # # #: 0) ! T 1 GA occ θGA 1 Assuming non-overlapping Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 17. Guidelines for Parameter Tuning Rmax and є0 determine the threshold between negligible noise and  imbalance ratio β determines the size of the moving window. The window should be  high enough to allow computing examples from both classes: f min ! =k f maj θGA can counterbalance the reproduction opportunities of most frequent  (majority) and least frequent niches (minority): 1 ! GA = k ' f min Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 18. XCS with Parameters Tuning XCS with parameter tuning XCS with standard settings ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 19. XCS Tuning for Real-world Datasets How we can estimate the niche frequency?  Estimate from the ratio of majority class instances and minority  class instances Problem:  • This may not be related to the distribution of niches in the feature space Take the approach to the small disjuncts problem  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 20. Online Identification of Small Disjuncts We search for regions that promote  overgeneral classifiers Estimate ircl based on the classifier’s  experience on each class: exp max ircl = exp min Adapt β and θGA according to ircl  ircl = 20 / 4 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 21. Online Parameter Adaptation ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 22. What about UCS? Supervised XCS:  Needs less exploration  Avoids XCS’s fitness dilemma  More robust to parameter settings  Overgeneral classifiers also tend to overcome the  population Their probability of occurrence depends on the imbalance ratio  Partially minimized with fitness sharing  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 23. What about UCS? ir=256:1 ir=512:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 24. Are LCSs more error-prone to class imbalance than other classifier schemes? C4.5 SMO XCS TP rate Bal2c1 0,00% ± 0,00% 0,00% ± 0,00% 0,00% ± 0,00% Bal2c2 81,65% ± 6,83% 93,72% ± 4,64% 81,96% ± 6,00% Bal2c3 81,90% ± 6,04% 93,77% ± 5,59% 83,99% ± 6,88% bpa 42,95% ± 14,09% 0,00% ± 0,00% 61,38% ± 9,10% gls2c1 80,00% ± 42,16% 0,00% ± 0,00% 50,00% ± 52,70% gls2c2 35,00% ± 47,43% 15,00% ± 33,75% 55,00% ± 49,72% gls2c3 30,00% ± 42,16% 0,00% ± 0,00% 5,00% ± 15,81% gls2c4 75,00% ± 32,63% 81,67% ± 25,40% 81,67% ± 25,40% gls2c5 77,14% ± 16,77% 10,00% ± 9,64% 84,29% ± 14,21% gls2c6 59,82% ± 15,13% 0,00% ± 0,00% 81,79% ± 13,95% h-s 75,83% ± 13,29% 80,00% ± 7,03% 80,00% ± 9,78% pim 55,37% ± 13,27% 53,38% ± 6,42% 55,93% ± 9,75% tao 95,23% ± 2,14% 84,11% ± 6,17% 92,58% ± 5,72% thy2c1 90,00% ± 16,10% 76,67% ± 22,50% 90,00% ± 16,10% thy2c2 94,17% ± 12,45% 54,17% ± 24,92% 90,83% ± 14,93% thy2c3 90,95% ± 10,34% 33,81% ± 21,35% 90,71% ± 8,05% wav2c1 75,74% ± 4,06% 88,51% ± 3,20% 87,24% ± 3,43% wav2c2 72,34% ± 3,89% 84,57% ± 4,05% 78,72% ± 2,57% wab2c3 77,64% ± 2,38% 89,97% ± 3,48% 87,86% ± 3,65% wbdc 92,95% ± 3,42% 95,42% ± 5,36% 95,83% ± 5,89% wdbc 92,47% ± 5,09% 94,81% ± 2,71% 93,83% ± 6,37% wine2c1 89,00% ± 16,63% 100,00% ± 0,00% 100,00% ± 0,00% wine2c2 95,00% ± 8,05% 98,33% ± 5,27% 98,33% ± 5,27% wine2c3 90,18% ± 11,70% 97,14% ± 6,02% 98,57% ± 4,52% Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla wpbc 41,00% ± 12,87% 9,50% ± 17,07% 30,50% ± 24,99%
  • 25. How can we Minimize the Effects of Small Disjuncts? Resampling the dataset:  Addresses small disjuncts Classical methods:  • Random oversampling • Random undersampling Assumes that clusterization will Heuristic methods:  find small • Tomek links disjuncts and • CNN match classifier’s • One-sided selection approximation • Smote Cluster-based oversampling  Could XCS benefit from the online Cost-sensitive classifiers  identification of small disjuncts? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 26. Domains of Applicability Should we use some counterbalancing scheme?  Which learning scheme should we use?  Is there a combination of counterbalancing  scheme+learner that beats all others? How can we know the presence of small  disjuncts? Are there other complexity factors mixed up with  the small disjuncts problem? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 27. Domains of Applicability Resampling/ Learn it! Classifier/ Resampling+classifier Where are LCSs placed? Dataset Dataset Suggested Prediction characterization approach Type of dataset: Geometrical distribution of classes Possible presence of small disjuncts Other complexity factors Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 28. Future Directions Potential benefit of XCS to discover small disjuncts  …and learn from it online Further analyze UCS  How do LCSs perform w.r.t. other classifiers for unbalanced  datasets? Measures for small disjuncts identification  … and other possible complexity factors What is noise and what is a small disjunct?  In which cases a LCS is applicable?  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 29. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain