SlideShare uma empresa Scribd logo
1 de 93
Baixar para ler offline
Workshop on Grand Challenges of Computational Intelligence (Cyprus, September 14, 2012)




     Scalability Improvement of
  Genetics-Based Machine Learning
         to Large Data Sets



                            Hisao Ishibuchi
              Osaka Prefecture University, Japan
Contents of This Presentation

 1. Basic Idea of Evolutionary Computation
Introduction Machine Learning
 2. Genetics-Based
 3. Parallel Distributed Implementation
 4. Computation Experiments
 5. Conclusion
Contents of This Presentation

 1. Basic Idea of Evolutionary Computation
Introduction Machine Learning
 2. Genetics-Based
 3. Parallel Distributed Implementation
 4. Computation Experiments
 5. Conclusion
Basic Idea of Evolutionary Computation

 Environment

  Population




Individual
Basic Idea of Evolutionary Computation

 Environment              Environment

  Population                Population


                  (1)



Individual               Good Individual

 (1) Natural selection in a tough environment.
Basic Idea of Evolutionary Computation

 Environment             Environment             Environment

  Population               Population              Population


                  (1)                     (2)



Individual              Good Individual         New Individual

 (1) Natural selection in a tough environment.
 (2) Reproduction of new individuals by crossover and mutation.
Basic Idea of Evolutionary Computation

Environment                                     Environment

 Population                                       Population




Iteration of the generation update many times
(1) Natural selection in a tough environment.
(2) Reproduction of new individuals by crossover and mutation.
Applications of Evolutionary Computation
Design of High Speed Trains
Environment
                      Population




              Individual = Design (   )
Applications of Evolutionary Computation
Design of Stock Trading Algorithms
Environment
                    Population




     Individual = Trading Algorithm (   )
Contents of This Presentation

 1. Basic Idea of Evolutionary Computation
Introduction Machine Learning
 2. Genetics-Based
 3. Parallel Distributed Implementation
 4. Computation Experiments
 5. Conclusion
Genetics-Based Machine Learning
Knowledge Extraction from Numerical Data


                                If (condition) then (result)
                                If (condition) then (result)
                 Evolutionary
                                If (condition) then (result)
 Training data   Computation
                                If (condition) then (result)
                                If (condition) then (result)



    Design of Rule-Based Systems
Design of Rule-Based Systems
Environment
                              Population
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...

   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...
   If ... Then ...   If ... Then ...   If ... Then ...        If ... Then ...


                                                         If ... Then ...

       Individual = Rule-Based System (                  If ... Then ...
                                                         If ... Then ...   )
                                                         If ... Then ...
Design of Decision Trees
Environment
                   Population




       Individual = Decision Tree (   )
Design of Neural Networks
Environment
                    Population




       Individual = Neural Network (   )
Multi-Objective Evolution
Minimization of Errors and Complexity
    Large
    Error




                                  Initial Population
    Small




            Simple   Complexity         Complicated
Multi-Objective Evolution
Minimization of Errors and Complexity
    Large
    Error
    Small




            Simple   Complexity   Complicated
Multi-Objective Evolution
A number of different neural networks
    Large
    Error
    Small




            Simple   Complexity   Complicated
Multi-Objective Evolution
A number of different decision trees
    Large
    Error
    Small




            Simple   Complexity   Complicated
Multi-Objective Evolution
A number of fuzzy rule-based systems
 Large
Error
Small




         Simple   Complexity   Complicated
Contents of This Presentation

 1. Basic Idea of Evolutionary Computation
Introduction Machine Learning
 2. Genetics-Based
 3. Parallel Distributed Implementation
 4. Computation Experiments
 5. Conclusion
Difficulty in Applications to Large Data
Computation Load for Fitness Evaluation
Environment
                                       Fitness Evaluation:
          Population                   Evaluation of each individual
   If ... Then ...   If ... Then ...   using the given training data
   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...

   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...                      Training data
   If ... Then ...   If ... Then ...


                                                   If ... Then ...

       Individual = Rule-Based System (            If ... Then ...
                                                   If ... Then ...   )
                                                   If ... Then ...
Difficulty in Applications to Large Data
Computation Load for Fitness Evaluation
Environment
                                       Fitness Evaluation:
          Population                   Evaluation of each individual
   If ... Then ...   If ... Then ...   using the given training data
   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...

   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...         Training data
   If ... Then ...   If ... Then ...
   If ... Then ...   If ... Then ...


                                                   If ... Then ...

       Individual = Rule-Based System (            If ... Then ...
                                                   If ... Then ...   )
                                                   If ... Then ...
Difficulty in Applications to Large Data
Computation Load for Fitness Evaluation
Environment
  Population
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                                         Training data
                    Fitness Evaluation
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...



                                                  If ... Then ...

           Individual = Rule-Based System (       If ... Then ...
                                                  If ... Then ...   )
                                                  If ... Then ...
The Main Issue in This Presentation
How to Decrease the Computation Load
Environment
  Population
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                                         Training data
                    Fitness Evaluation
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...



                                                  If ... Then ...

           Individual = Rule-Based System (       If ... Then ...
                                                  If ... Then ...   )
                                                  If ... Then ...
Fitness Calculation
Standard Non-Parallel Model
Environment
  Population
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                                         Training data
                    Fitness Evaluation
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...



                                                  If ... Then ...

           Individual = Rule-Based System (       If ... Then ...
                                                  If ... Then ...   )
                                                  If ... Then ...
Fitness Calculation
Standard Non-Parallel Model
Environment
  Population
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                    Fitness Evaluation
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                                         Training data
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...



                                                  If ... Then ...

           Individual = Rule-Based System (       If ... Then ...
                                                  If ... Then ...   )
                                                  If ... Then ...
Fitness Calculation
Standard Non-Parallel Model
Environment
  Population
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                    Fitness Evaluation
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                                         Training data
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...



                                                  If ... Then ...

           Individual = Rule-Based System (       If ... Then ...
                                                  If ... Then ...   )
                                                  If ... Then ...
Fitness Calculation
Standard Non-Parallel Model
Environment
  Population
                    Fitness Evaluation
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...


  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...
                                         Training data
  If ... Then ...
  If ... Then ...
  If ... Then ...
  If ... Then ...



                                                  If ... Then ...

           Individual = Rule-Based System (       If ... Then ...
                                                  If ... Then ...   )
                                                  If ... Then ...
A Popular Approach for Speed-Up
Parallel Computation of Fitness Evaluation

                     Fitness Evaluation
   If ... Then ...
   If ... Then ...
   CPU 1
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   CPU 2
   If ... Then ...
   If ... Then ...


   If ... Then ...

   CPU 3
   If ... Then ...
   If ... Then ...
   If ... Then ...
                                          Training data
   If ... Then ...
   CPU 4
   If ... Then ...
   If ... Then ...
   If ... Then ...




 If we use n CPUs, the computation load for each CPU can be 1/n in
 comparison with the case of a single CPU (e.g., 25% by four CPUs)
Another Approach for Speed-Up
Training Data Reduction

   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...
                             Training data
   If ... Then ...
   If ... Then ...
                        Training data Subset
   If ... Then ...
   If ... Then ...




 If we use x% of the training data, the computation load can be
 reduced to x% in comparison with the use of all the training data.
Another Approach for Speed-Up
Training Data Reduction

   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...
                             Training data
   If ... Then ...
   If ... Then ...
                        Training data Subset
   If ... Then ...
   If ... Then ...




 If we use x% of the training data, the computation load can be
 reduced to x% in comparison with the use of all the training data.
Another Approach for Speed-Up
Training Data Reduction

   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...
                             Training data
   If ... Then ...
   If ... Then ...
                        Training data Subset
   If ... Then ...
   If ... Then ...




 If we use x% of the training data, the computation load can be
 reduced to x% in comparison with the use of all the training data.
Another Approach for Speed-Up
Training Data Reduction

   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...
                             Training data
   If ... Then ...
   If ... Then ...
                        Training data Subset
   If ... Then ...
   If ... Then ...




 If we use x% of the training data, the computation load can be
 reduced to x% in comparison with the use of all the training data.
Another Approach for Speed-Up
Training Data Reduction

   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...


   If ... Then ...
   If ... Then ...
   If ... Then ...
   If ... Then ...
                           Training data
   If ... Then ...
   If ... Then ...
                       Training data Subset
   If ... Then ...
   If ... Then ...




 Difficulty: How to choose a training data subset
 The population will overfit to the selected training data subset.
Idea of Windowing in J. Bacardit et al.: Speeding-up
Pittsburgh learning classifier systems: Modeling time
and accuracy. PPSN 2004.
                  Training data are divided into data subsets.




Training data
Idea of Windowing in J. Bacardit et al.: Speeding-up
Pittsburgh learning classifier systems: Modeling time
and accuracy. PPSN 2004.
                   1st Generation                               Fitness Evaluation
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...




At each generation, a different data
subset (i.e., different window) is used.
Idea of Windowing in J. Bacardit et al.: Speeding-up
Pittsburgh learning classifier systems: Modeling time
and accuracy. PPSN 2004.




                                                                Fitness Evaluation
                   2nd Generation
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...




At each generation, a different data
subset (i.e., different window) is used.
Idea of Windowing in J. Bacardit et al.: Speeding-up
Pittsburgh learning classifier systems: Modeling time
and accuracy. PPSN 2004.




                                                                Fitness Evaluation
                   3rd Generation
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...
 If ... Then ...    If ... Then ...   If ... Then ...   If ... Then ...




At each generation, a different data
subset (i.e., different window) is used.
Visual Image of Windowing
A population is moving around in the training data.


               Training Data = Environment

    Population
  If ... Then ...   If ... Then ...
  If ... Then ...   If ... Then ...
  If ... Then ...   If ... Then ...
  If ... Then ...   If ... Then ...

  If ... Then ...   If ... Then ...
  If ... Then ...   If ... Then ...
  If ... Then ...   If ... Then ...
  If ... Then ...   If ... Then ...
Visual Image of Windowing
A population is moving around in the training data.


     Training Data = Environment



              Population
            If ... Then ...   If ... Then ...
            If ... Then ...   If ... Then ...
            If ... Then ...   If ... Then ...
            If ... Then ...   If ... Then ...

            If ... Then ...   If ... Then ...
            If ... Then ...   If ... Then ...
            If ... Then ...   If ... Then ...
            If ... Then ...   If ... Then ...
Visual Image of Windowing
A population is moving around in the training data.


     Training Data = Environment

                              Population
                            If ... Then ...   If ... Then ...
                            If ... Then ...   If ... Then ...
                            If ... Then ...   If ... Then ...
                            If ... Then ...   If ... Then ...

                            If ... Then ...   If ... Then ...
                            If ... Then ...   If ... Then ...
                            If ... Then ...   If ... Then ...
                            If ... Then ...   If ... Then ...
Visual Image of Windowing
A population is moving around in the training data.


     Training Data = Environment


                                          Population
                                        If ... Then ...   If ... Then ...
                                        If ... Then ...   If ... Then ...
                                        If ... Then ...   If ... Then ...
                                        If ... Then ...   If ... Then ...

                                        If ... Then ...   If ... Then ...
                                        If ... Then ...   If ... Then ...
                                        If ... Then ...   If ... Then ...
                                        If ... Then ...   If ... Then ...
Visual Image of Windowing
A population is moving around in the training data.


     Training Data = Environment
                       Population
                     If ... Then ...   If ... Then ...
                     If ... Then ...   If ... Then ...
                     If ... Then ...   If ... Then ...
                     If ... Then ...   If ... Then ...

                     If ... Then ...   If ... Then ...
                     If ... Then ...   If ... Then ...
                     If ... Then ...   If ... Then ...
                     If ... Then ...   If ... Then ...
Visual Image of Windowing
A population is moving around in the training data.


       Training Data = Environment
                               Population
                             If ... Then ...   If ... Then ...
                             If ... Then ...   If ... Then ...
                             If ... Then ...   If ... Then ...
                             If ... Then ...   If ... Then ...

                             If ... Then ...   If ... Then ...
                             If ... Then ...   If ... Then ...
                             If ... Then ...   If ... Then ...
                             If ... Then ...   If ... Then ...



 After enough evolution with a moving window
  The population does not overfit to any particular training data subset.
  The population may have high generalization ability.
Our Idea: Parallel Distributed Implementation
H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models
with Rule Set Migration and Training Data Rotation. TFS (in Press)
Non-parallel Non-distributed
Population


                 CPU




             Training data
Our Idea: Parallel Distributed Implementation
H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models
with Rule Set Migration and Training Data Rotation. TFS (in Press)
Non-parallel Non-distributed          Our Parallel Distributed Model
Population


                 CPU                   CPU CPU CPU CPU CPU




             Training data

   (1) A population is divided into multiple subpopulations.
                                              (as in an island model)
Our Idea: Parallel Distributed Implementation
H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models
with Rule Set Migration and Training Data Rotation. TFS (in Press)
Non-parallel Non-distributed         Our Parallel Distributed Model
Population


                 CPU                  CPU CPU CPU CPU CPU




             Training data                   Training data

   (1) A population is divided into multiple subpopulations.
   (2) Training data are also divided into multiple subsets.
                                       (as in the windowing method)
Our Idea: Parallel Distributed Implementation
H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models
with Rule Set Migration and Training Data Rotation. TFS (in Press)
Non-parallel Non-distributed          Our Parallel Distributed Model
Population


                 CPU                   CPU CPU CPU CPU CPU




             Training data                     Training data

   (1) A population is divided into multiple subpopulations.
   (2) Training data are also divided into multiple subsets.
   (3) An evolutionary algorithm is locally performed at each CPU.
                                               (as in an island model)
Our Idea: Parallel Distributed Implementation
H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models
with Rule Set Migration and Training Data Rotation. TFS (in Press)
Non-parallel Non-distributed         Our Parallel Distributed Model
Population


                 CPU                  CPU CPU CPU CPU CPU




             Training data                   Training data

   (1) A population is divided into multiple subpopulations.
   (2) Training data are also divided into multiple subsets.
   (3) An evolutionary algorithm is locally performed at each CPU.
   (4) Training data subsets are periodically rotated.
                                 (e.g., every 100 generations)
Our Idea: Parallel Distributed Implementation
H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models
with Rule Set Migration and Training Data Rotation. TFS (in Press)
Non-parallel Non-distributed         Our Parallel Distributed Model
Population


                 CPU                  CPU CPU CPU CPU CPU




             Training data                   Training data

   (1) A population is divided into multiple subpopulations.
   (2) Training data are also divided into multiple subsets.
   (3) An evolutionary algorithm is locally performed at each CPU.
   (4) Training data subsets are periodically rotated.
   (5) Migration is also periodically performed.
Computation Load of Four Models
           EC
           CPU

        Fitness
       Evaluation
Standard Non-Parallel Model

 Computation Load
 = EC Part + Fitness Evaluation Part
Computation Load of Four Models
           EC            EC = Evolutionary Computation
           CPU

        Fitness               EC
       Evaluation             = { Selection, Crossover,
                                  Mutation, Generation Update }
Standard Non-Parallel Model

 Computation Load
 = EC Part + Fitness Evaluation Part
Computation Load of Four Models
             EC
             CPU

         Fitness
        Evaluation
Standard Non-Parallel Model

             EC
    CPU   CPU   CPU   CPU   Computation Load
       CPU   CPU   CPU       = EC Part
                               + Fitness Evaluation Part (1/7)

  Standard Parallel Model
 (Parallel Fitness Evaluation)
Computation Load of Four Models
             EC                                EC
             CPU                             CPU

         Fitness
        Evaluation
Standard Non-Parallel Model             Windowing Model
                                    (Reduced Training Data Set)

             EC                        Computation Load
    CPU   CPU   CPU   CPU                  = EC Part
       CPU   CPU   CPU           + Fitness Evaluation Part (1/7)


  Standard Parallel Model
 (Parallel Fitness Evaluation)
Computation Load of Four Models
             EC                                 EC
             CPU                              CPU

         Fitness
        Evaluation
Standard Non-Parallel Model             Windowing Model
                                  (Reduced Training Data Set)

             EC
    CPU   CPU   CPU   CPU         CPU         CPU         CPU         CPU
       CPU   CPU   CPU                  CPU         CPU         CPU



  Standard Parallel Model         Parallel Distributed Model
 (Parallel Fitness Evaluation)   (Divided Population & Data Set)
Computation Load of Four Models
             EC                                 EC
             CPU                              CPU

         Fitness
        Evaluation
Standard Non-Parallel Model             Windowing Model
                                  (Reduced Training Data Set)

             EC                           EC Part (1/7)
    CPU   CPU   CPU   CPU         CPU         CPU         CPU         CPU
       CPU   CPU   CPU                  CPU         CPU         CPU
                                 Fitness Evaluation Part (1/7)

  Standard Parallel Model         Parallel Distributed Model
 (Parallel Fitness Evaluation)   (Divided Population & Data Set)
Contents of This Presentation

 1. Basic Idea of Evolutionary Computation
Introduction Machine Learning
 2. Genetics-Based
 3. Parallel Distributed Implementation
 4. Computation Experiments
 5. Conclusion
Our Model in Computational Experiments
with Seven Subpopulations and Seven Data Subsets
Our Model in Computational Experiments
with Seven Subpopulations and Seven Data Subsets


                 Seven CPUs


        Seven Subpopulations of Size 30
         Population Size = 30 x 7 = 210



              Seven Data Subsets
Standard Non-Parallel Non-Distributed Model
with a Single Population and a Single Data Set




           Population
Standard Non-Parallel Non-Distributed Model
with a Single Population and a Single Data Set

                              Single CPU


                      Single Population of Size 210




                                 Whole Data Set
Standard Non-Parallel Non-Distributed Model
with a Single Population and a Single Data Set

                                  Single CPU


                        Single Population of Size 210




                                     Whole Data Set

 Termination Conditions: 50,000 Generations
 Computation Load: 210 x 50,000 = 10,500,000 Evaluations
                         (more than ten million evaluations)
Comparison of Computation Load
Computation Load on a Single CPU per Generation
Standard Model:               Parallel Distributed Model:
Evaluation of 210 rule sets   Evaluation of 30 rule sets using
using all the training data   one of the seven data subsets.
Comparison of Computation Load
Computation Load on a Single CPU per Generation
Standard Model:               Parallel Distributed Model:
Evaluation of 210 rule sets   Evaluation of 30 rule sets using
using all the training data   one of the seven data subsets.




                                                          1/7

                                                          1/7
Comparison of Computation Load
Computation Load on a Single 1/7 = per Generation
Computation Load ==> 1/7 x CPU 1/49 (about 2%)
Standard Model:               Parallel Distributed Model:
Evaluation of 210 rule sets   Evaluation of 30 rule sets using
using all the training data   one of the seven data subsets.




                                                          1/7

                                                          1/7
Data Sets in Computational Experiments
Nine Pattern Classification Problems
  Name of      Number of   Number of    Number of
  Data Set      Patterns   Attributes    Classes
  Segment        2,310        19            7
  Phoneme        5,404          5           2
 Page-blocks     5,472        10            5
   Texture       5,500        40           11
  Satimage       6,435        36            6
  Twonorm        7,400        20            2
    Ring         7,400        20            2
 PenBased       10,992        16           10
   Magic        19,020        10            2
Computation Time for 50,000 Generations
Computation time was decreased to about 2%
  Name of     Standard    Our Model Percentage of B
  Data Set    A minutes   B minutes     B/A (%)
 Segment        203.66       4.69        2.30%
 Phoneme        439.18      13.19        3.00%
Page-blocks     204.63       4.74        2.32%
  Texture       766.61      15.72        2.05%
 Satimage       658.89      15.38        2.33%
 Twonorm        856.58       7.84        0.92%
   Ring        1015.04      22.52        2.22%
 PenBased      1520.54      35.56        2.34%
   Magic        771.05      22.58        2.93%
Computation Time for 50,000 Generations
Computation time was decreased to about 2%
  Name of     Standard      Our Model Percentage of B
  Data Set    A minutes     B minutes     B/A (%)
 Segment        203.66         4.69        2.30%
    Why ?
 Phoneme        439.18        13.19        3.00%
      Because the population and the training
Page-blockswere divided into 4.74 subsets.
      data     204.63        seven      2.32%
  Texture       766.61         15.72       2.05%
             1/7 x 1/7 =   1/49 (about   2%)
 Satimage       658.89         15.38       2.33%
 Twonorm        856.58         7.84        0.92%
   Ring        1015.04        22.52        2.22%
 PenBased      1520.54        35.56        2.34%
   Magic        771.05        22.58        2.93%
Computation Time for 50,000 Generations
Computation time was decreased to about 2%
  Name of     Standard      Our Model Percentage of B
  Data Set    A minutes     B minutes     B/A (%)
 Segment        203.66         4.69        2.30%
    Why ?
 Phoneme        439.18        13.19        3.00%
      Because the population and the training
Page-blockswere divided into 4.74 subsets.
      data     204.63        seven      2.32%
  Texture       766.61         15.72       2.05%
             1/7 x 1/7 =   1/49 (about   2%)
 Satimage       658.89         15.38       2.33%
 Twonorm        856.58         7.84        0.92%
   Ring        1015.04        22.52        2.22%
 PenBased      1520.54        35.56        2.34%
   Magic        771.05        22.58        2.93%
Test Data Error Rates (Results of 3x10CV)
Test data accuracy was improved for six data sets

  Name of     Standard    Our Model Improvement
  Data Set      (A %)       (B %)   from A: (A - B)%
  Segment        5.99        5.90          0.09
 Phoneme        15.43       15.96         - 0.53
Page-blocks      3.81        3.62          0.19
  Texture        4.64        4.77         - 0.13
  Satimage      15.54       12.96          2.58
  Twonorm        7.36        3.39          3.97
    Ring         6.73        5.25          1.48
 PenBased        3.07        3.30         - 0.23
   Magic        15.42       14.89          0.53
Test Data Error Rates (Results of 3x10CV)
Test data accuracy was improved for six data sets

  Name of     Standard    Our Model Improvement
  Data Set      (A %)       (B %)   from A: (A - B)%
  Segment        5.99        5.90          0.09
 Phoneme        15.43       15.96         - 0.53
Page-blocks      3.81        3.62          0.19
  Texture        4.64        4.77         - 0.13
  Satimage      15.54       12.96          2.58
  Twonorm        7.36        3.39          3.97
    Ring         6.73        5.25          1.48
 PenBased        3.07        3.30         - 0.23
   Magic        15.42       14.89          0.53
Q. Why did our model improve the test data accuracy ?
A. Because our model improved the search ability.
  Data Set                                  Standard     Our Model       Improvement
  Satimage                                   15.54%       12.96%            2.58%

             Training Data Error Rate (%)




                                                 Non-Parallel Non-Distributed



                                                Our Parallel Distributed Model

                                                 Number of Generations
Q. Why did our model improve the search ability ?
A. Because our model maintained the diversity.
                       Data Set           Standard         Our Model        Improvement
                       Satimage            15.54%           12.96%             2.58%
 Training Data Error Rate (%)




                                The best and worst error rates
                                in a particular subpopulation at
                                each generation in a single run.
                                 Non-Parallel Non-Distributed Model
                                        (Best = Worst: No Diversity)

                                                               Best    Training Data Rotation:
                                                                         Every 100 Generations
                                                     Worst
                                                                       Rule Set Migration:
                                      Parallel Distributed Model         Every 100 Generations


                                    Number of Generations
Effects of Rotation and Migration Intervals




      Training Data Rotation: Every 100 Generations
      Rule Set Migration: Every 100 Generations
Effects of Rotation and Migration Intervals
(Rotations in the opposite directions)
                                             Rule set migration

                              CPU     CPU   CPU     CPU     CPU      CPU   CPU




                         Population
                                            Training data rotation

                         Training data




            No   No
Effects of Rotation and Migration Intervals
(Rotations in the opposite directions)
                                                            Rule set migration

                                            CPU      CPU   CPU     CPU     CPU      CPU   CPU




                                       Population
  18        Good
           Results
  16
   14                                                      Training data rotation

                                       Training data
   12
   10
    20
      50
       100
         200
                                                    20
           500                               50
                                     100
                               200
               ∞
               No
                    ∞
                    No
                         500
Effects of Rotation and Migration Intervals
(Rotations in the same direction)
                                             Rule set migration

                              CPU     CPU   CPU     CPU     CPU      CPU   CPU




                         Population
                                            Training data rotation

                         Training data




            No   No
Effects of Rotation and Migration Intervals
(Rotations in the same direction)
                                                             Rule set migration

                                             CPU      CPU   CPU     CPU     CPU      CPU   CPU




                                        Population
  18
   16
   14                                                       Training data rotation

                                        Training data
   12
    10                                               Good Results
     20
       50
        100
          200
                                                     20
            500                                50
                                      100
                                200
                ∞
                No
                     ∞
                     No
                          500
Effects of Rotation and Migration Intervals


18                                                 18
16                                                 16
14                                                 14
 12                                                 12
 10                                                 10
  20                                                 20
    50                                                 50
     100                                                100
       200                                                200
                                              20                                                 20
         500                             50                 500                             50
                                   100                                                100
                             200                                                200
             ∞
             No
                  ∞
                  No
                       500                                      ∞
                                                                No
                                                                     ∞
                                                                     No
                                                                          500


      Opposite Directions                                       Same Direction
Effects of Rotation and Migration Intervals
                                                           Interaction between
                                                           rotation and migration
18                                                 18
16                                                 16
14                                                 14
 12                                                 12
 10                                                 10
  20                                                 20
    50                                                 50
     100                                                100
       200                                                200
                                              20                                                 20
         500                             50                 500                             50
                                   100                                                100
                             200                                                200
             ∞
             No
                  ∞
                  No
                       500                                      ∞
                                                                No
                                                                     ∞
                                                                     No
                                                                          500


      Opposite Directions                                       Same Direction
Effects of Rotation and Migration Intervals
                                                           Interaction between
                                                           rotation and migration
18                                                 18
16                                                 16                           Negative
14                                                 14
                                                                                 Effects

 12                                                 12
 10                                                 10
  20                                                 20
    50                                                 50
     100                                                100
       200                                                200
                                              20                                                 20
         500                             50                 500                             50
                                   100                                                100
                             200                                                200
             ∞
             No
                  ∞
                  No
                       500                                      ∞
                                                                No
                                                                     ∞
                                                                     No
                                                                          500


      Opposite Directions                                       Same Direction
Contents of This Presentation

 1. Basic Idea of Evolutionary Computation
Introduction Machine Learning
 2. Genetics-Based
 3. Parallel Distributed Implementation
 4. Computation Experiments
 5. Conclusion
Conclusion
1. We explained our parallel distributed model.
2. It was shown that the computation time was decreased to 2%.
Introduction
3. It was shown that the test data accuracy was improved.
4. We explained negative effects of the interaction between the
   training data rotation and the rule set migration.
Conclusion
1. We explained our parallel distributed model.
2. It was shown that the computation time was decreased to 2%.
Introduction
3. It was shown that the test data accuracy was improved.
4. We explained negative effects of the interaction between the
   training data rotation and the rule set migration.
Conclusion
1. We explained our parallel distributed model.
2. It was shown that the computation time was decreased to 2%.
Introduction
3. It was shown that the test data accuracy was improved.
4. We explained negative effects of the interaction between the
   training data rotation and the rule set migration.
Conclusion
1. We explained our parallel distributed model.
2. It was shown that the computation time was decreased to 2%.
Introduction
3. It was shown that the test data accuracy was improved.
4. We explained negative effects of the interaction between the
   training data rotation and the rule set migration.
Conclusion
5. A little bit different model may be also possible for learning
  from locally located data bases.
Introduction
        CPU                                        CPU



     University A                               University C
                       Subpopulation
                         Rotation
        CPU                                        CPU



     University B                               University D
Conclusion
6. This model can be also used for learning from changing
  data bases.
Introduction
       CPU                                   CPU



    University A                          University C
                    Subpopulation
                      Rotation
       CPU                                   CPU



    University B                          University D
Conclusion
6. This model can be also used for learning from changing
  data bases.
Introduction
       CPU                                   CPU



    University A                          University C
                    Subpopulation
                      Rotation
       CPU                                   CPU



    University B                          University D
Conclusion
6. This model can be also used for learning from changing
  data bases.
Introduction
       CPU                                   CPU



    University A                          University C
                    Subpopulation
                      Rotation
       CPU                                   CPU



    University B                          University D
Conclusion
6. This model can be also used for learning from changing
  data bases.
Introduction
       CPU                                   CPU



    University A                          University C
                    Subpopulation
                      Rotation
       CPU                                   CPU



    University B                          University D
Conclusion
6. This model can be also used for learning from changing
  data bases.
Introduction
       CPU                                   CPU



    University A                          University C
                    Subpopulation
                      Rotation
       CPU                                   CPU



    University B                          University D
Conclusion




   Thank you very much!

Mais conteúdo relacionado

Semelhante a Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning to Large Data Sets"

Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowYi-Fan Liou
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsBrian Lange
 
Course 10 : Introduction to machine learning by Christoph Evers
Course 10 :  Introduction to machine learning by Christoph EversCourse 10 :  Introduction to machine learning by Christoph Evers
Course 10 : Introduction to machine learning by Christoph EversBetacowork
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inferencebutest
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do itGregory Renard
 
lecture13-NN-basics.pptx
lecture13-NN-basics.pptxlecture13-NN-basics.pptx
lecture13-NN-basics.pptxAbijahRoseline1
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsVidya sagar Sharma
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 

Semelhante a Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning to Large Data Sets" (16)

Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with Tensorflow
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 
Course 10 : Introduction to machine learning by Christoph Evers
Course 10 :  Introduction to machine learning by Christoph EversCourse 10 :  Introduction to machine learning by Christoph Evers
Course 10 : Introduction to machine learning by Christoph Evers
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
lecture13-NN-basics.pptx
lecture13-NN-basics.pptxlecture13-NN-basics.pptx
lecture13-NN-basics.pptx
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Machine_Learning_Co__
Machine_Learning_Co__Machine_Learning_Co__
Machine_Learning_Co__
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 

Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning to Large Data Sets"

  • 1. Workshop on Grand Challenges of Computational Intelligence (Cyprus, September 14, 2012) Scalability Improvement of Genetics-Based Machine Learning to Large Data Sets Hisao Ishibuchi Osaka Prefecture University, Japan
  • 2. Contents of This Presentation 1. Basic Idea of Evolutionary Computation Introduction Machine Learning 2. Genetics-Based 3. Parallel Distributed Implementation 4. Computation Experiments 5. Conclusion
  • 3. Contents of This Presentation 1. Basic Idea of Evolutionary Computation Introduction Machine Learning 2. Genetics-Based 3. Parallel Distributed Implementation 4. Computation Experiments 5. Conclusion
  • 4. Basic Idea of Evolutionary Computation Environment Population Individual
  • 5. Basic Idea of Evolutionary Computation Environment Environment Population Population (1) Individual Good Individual (1) Natural selection in a tough environment.
  • 6. Basic Idea of Evolutionary Computation Environment Environment Environment Population Population Population (1) (2) Individual Good Individual New Individual (1) Natural selection in a tough environment. (2) Reproduction of new individuals by crossover and mutation.
  • 7. Basic Idea of Evolutionary Computation Environment Environment Population Population Iteration of the generation update many times (1) Natural selection in a tough environment. (2) Reproduction of new individuals by crossover and mutation.
  • 8. Applications of Evolutionary Computation Design of High Speed Trains Environment Population Individual = Design ( )
  • 9. Applications of Evolutionary Computation Design of Stock Trading Algorithms Environment Population Individual = Trading Algorithm ( )
  • 10. Contents of This Presentation 1. Basic Idea of Evolutionary Computation Introduction Machine Learning 2. Genetics-Based 3. Parallel Distributed Implementation 4. Computation Experiments 5. Conclusion
  • 11. Genetics-Based Machine Learning Knowledge Extraction from Numerical Data If (condition) then (result) If (condition) then (result) Evolutionary If (condition) then (result) Training data Computation If (condition) then (result) If (condition) then (result) Design of Rule-Based Systems
  • 12. Design of Rule-Based Systems Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 13. Design of Decision Trees Environment Population Individual = Decision Tree ( )
  • 14. Design of Neural Networks Environment Population Individual = Neural Network ( )
  • 15. Multi-Objective Evolution Minimization of Errors and Complexity Large Error Initial Population Small Simple Complexity Complicated
  • 16. Multi-Objective Evolution Minimization of Errors and Complexity Large Error Small Simple Complexity Complicated
  • 17. Multi-Objective Evolution A number of different neural networks Large Error Small Simple Complexity Complicated
  • 18. Multi-Objective Evolution A number of different decision trees Large Error Small Simple Complexity Complicated
  • 19. Multi-Objective Evolution A number of fuzzy rule-based systems Large Error Small Simple Complexity Complicated
  • 20. Contents of This Presentation 1. Basic Idea of Evolutionary Computation Introduction Machine Learning 2. Genetics-Based 3. Parallel Distributed Implementation 4. Computation Experiments 5. Conclusion
  • 21. Difficulty in Applications to Large Data Computation Load for Fitness Evaluation Environment Fitness Evaluation: Population Evaluation of each individual If ... Then ... If ... Then ... using the given training data If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 22. Difficulty in Applications to Large Data Computation Load for Fitness Evaluation Environment Fitness Evaluation: Population Evaluation of each individual If ... Then ... If ... Then ... using the given training data If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 23. Difficulty in Applications to Large Data Computation Load for Fitness Evaluation Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 24. The Main Issue in This Presentation How to Decrease the Computation Load Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 25. Fitness Calculation Standard Non-Parallel Model Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 26. Fitness Calculation Standard Non-Parallel Model Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 27. Fitness Calculation Standard Non-Parallel Model Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 28. Fitness Calculation Standard Non-Parallel Model Environment Population Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Individual = Rule-Based System ( If ... Then ... If ... Then ... ) If ... Then ...
  • 29. A Popular Approach for Speed-Up Parallel Computation of Fitness Evaluation Fitness Evaluation If ... Then ... If ... Then ... CPU 1 If ... Then ... If ... Then ... If ... Then ... If ... Then ... CPU 2 If ... Then ... If ... Then ... If ... Then ... CPU 3 If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... CPU 4 If ... Then ... If ... Then ... If ... Then ... If we use n CPUs, the computation load for each CPU can be 1/n in comparison with the case of a single CPU (e.g., 25% by four CPUs)
  • 30. Another Approach for Speed-Up Training Data Reduction If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... Training data Subset If ... Then ... If ... Then ... If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.
  • 31. Another Approach for Speed-Up Training Data Reduction If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... Training data Subset If ... Then ... If ... Then ... If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.
  • 32. Another Approach for Speed-Up Training Data Reduction If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... Training data Subset If ... Then ... If ... Then ... If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.
  • 33. Another Approach for Speed-Up Training Data Reduction If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... Training data Subset If ... Then ... If ... Then ... If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.
  • 34. Another Approach for Speed-Up Training Data Reduction If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... Training data If ... Then ... If ... Then ... Training data Subset If ... Then ... If ... Then ... Difficulty: How to choose a training data subset The population will overfit to the selected training data subset.
  • 35. Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time and accuracy. PPSN 2004. Training data are divided into data subsets. Training data
  • 36. Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time and accuracy. PPSN 2004. 1st Generation Fitness Evaluation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... At each generation, a different data subset (i.e., different window) is used.
  • 37. Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time and accuracy. PPSN 2004. Fitness Evaluation 2nd Generation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... At each generation, a different data subset (i.e., different window) is used.
  • 38. Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time and accuracy. PPSN 2004. Fitness Evaluation 3rd Generation If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... At each generation, a different data subset (i.e., different window) is used.
  • 39. Visual Image of Windowing A population is moving around in the training data. Training Data = Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ...
  • 40. Visual Image of Windowing A population is moving around in the training data. Training Data = Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ...
  • 41. Visual Image of Windowing A population is moving around in the training data. Training Data = Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ...
  • 42. Visual Image of Windowing A population is moving around in the training data. Training Data = Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ...
  • 43. Visual Image of Windowing A population is moving around in the training data. Training Data = Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ...
  • 44. Visual Image of Windowing A population is moving around in the training data. Training Data = Environment Population If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... If ... Then ... After enough evolution with a moving window The population does not overfit to any particular training data subset. The population may have high generalization ability.
  • 45. Our Idea: Parallel Distributed Implementation H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models with Rule Set Migration and Training Data Rotation. TFS (in Press) Non-parallel Non-distributed Population CPU Training data
  • 46. Our Idea: Parallel Distributed Implementation H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models with Rule Set Migration and Training Data Rotation. TFS (in Press) Non-parallel Non-distributed Our Parallel Distributed Model Population CPU CPU CPU CPU CPU CPU Training data (1) A population is divided into multiple subpopulations. (as in an island model)
  • 47. Our Idea: Parallel Distributed Implementation H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models with Rule Set Migration and Training Data Rotation. TFS (in Press) Non-parallel Non-distributed Our Parallel Distributed Model Population CPU CPU CPU CPU CPU CPU Training data Training data (1) A population is divided into multiple subpopulations. (2) Training data are also divided into multiple subsets. (as in the windowing method)
  • 48. Our Idea: Parallel Distributed Implementation H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models with Rule Set Migration and Training Data Rotation. TFS (in Press) Non-parallel Non-distributed Our Parallel Distributed Model Population CPU CPU CPU CPU CPU CPU Training data Training data (1) A population is divided into multiple subpopulations. (2) Training data are also divided into multiple subsets. (3) An evolutionary algorithm is locally performed at each CPU. (as in an island model)
  • 49. Our Idea: Parallel Distributed Implementation H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models with Rule Set Migration and Training Data Rotation. TFS (in Press) Non-parallel Non-distributed Our Parallel Distributed Model Population CPU CPU CPU CPU CPU CPU Training data Training data (1) A population is divided into multiple subpopulations. (2) Training data are also divided into multiple subsets. (3) An evolutionary algorithm is locally performed at each CPU. (4) Training data subsets are periodically rotated. (e.g., every 100 generations)
  • 50. Our Idea: Parallel Distributed Implementation H. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models with Rule Set Migration and Training Data Rotation. TFS (in Press) Non-parallel Non-distributed Our Parallel Distributed Model Population CPU CPU CPU CPU CPU CPU Training data Training data (1) A population is divided into multiple subpopulations. (2) Training data are also divided into multiple subsets. (3) An evolutionary algorithm is locally performed at each CPU. (4) Training data subsets are periodically rotated. (5) Migration is also periodically performed.
  • 51. Computation Load of Four Models EC CPU Fitness Evaluation Standard Non-Parallel Model Computation Load = EC Part + Fitness Evaluation Part
  • 52. Computation Load of Four Models EC EC = Evolutionary Computation CPU Fitness EC Evaluation = { Selection, Crossover, Mutation, Generation Update } Standard Non-Parallel Model Computation Load = EC Part + Fitness Evaluation Part
  • 53. Computation Load of Four Models EC CPU Fitness Evaluation Standard Non-Parallel Model EC CPU CPU CPU CPU Computation Load CPU CPU CPU = EC Part + Fitness Evaluation Part (1/7) Standard Parallel Model (Parallel Fitness Evaluation)
  • 54. Computation Load of Four Models EC EC CPU CPU Fitness Evaluation Standard Non-Parallel Model Windowing Model (Reduced Training Data Set) EC Computation Load CPU CPU CPU CPU = EC Part CPU CPU CPU + Fitness Evaluation Part (1/7) Standard Parallel Model (Parallel Fitness Evaluation)
  • 55. Computation Load of Four Models EC EC CPU CPU Fitness Evaluation Standard Non-Parallel Model Windowing Model (Reduced Training Data Set) EC CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Standard Parallel Model Parallel Distributed Model (Parallel Fitness Evaluation) (Divided Population & Data Set)
  • 56. Computation Load of Four Models EC EC CPU CPU Fitness Evaluation Standard Non-Parallel Model Windowing Model (Reduced Training Data Set) EC EC Part (1/7) CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Fitness Evaluation Part (1/7) Standard Parallel Model Parallel Distributed Model (Parallel Fitness Evaluation) (Divided Population & Data Set)
  • 57. Contents of This Presentation 1. Basic Idea of Evolutionary Computation Introduction Machine Learning 2. Genetics-Based 3. Parallel Distributed Implementation 4. Computation Experiments 5. Conclusion
  • 58. Our Model in Computational Experiments with Seven Subpopulations and Seven Data Subsets
  • 59. Our Model in Computational Experiments with Seven Subpopulations and Seven Data Subsets Seven CPUs Seven Subpopulations of Size 30 Population Size = 30 x 7 = 210 Seven Data Subsets
  • 60. Standard Non-Parallel Non-Distributed Model with a Single Population and a Single Data Set Population
  • 61. Standard Non-Parallel Non-Distributed Model with a Single Population and a Single Data Set Single CPU Single Population of Size 210 Whole Data Set
  • 62. Standard Non-Parallel Non-Distributed Model with a Single Population and a Single Data Set Single CPU Single Population of Size 210 Whole Data Set Termination Conditions: 50,000 Generations Computation Load: 210 x 50,000 = 10,500,000 Evaluations (more than ten million evaluations)
  • 63. Comparison of Computation Load Computation Load on a Single CPU per Generation Standard Model: Parallel Distributed Model: Evaluation of 210 rule sets Evaluation of 30 rule sets using using all the training data one of the seven data subsets.
  • 64. Comparison of Computation Load Computation Load on a Single CPU per Generation Standard Model: Parallel Distributed Model: Evaluation of 210 rule sets Evaluation of 30 rule sets using using all the training data one of the seven data subsets. 1/7 1/7
  • 65. Comparison of Computation Load Computation Load on a Single 1/7 = per Generation Computation Load ==> 1/7 x CPU 1/49 (about 2%) Standard Model: Parallel Distributed Model: Evaluation of 210 rule sets Evaluation of 30 rule sets using using all the training data one of the seven data subsets. 1/7 1/7
  • 66. Data Sets in Computational Experiments Nine Pattern Classification Problems Name of Number of Number of Number of Data Set Patterns Attributes Classes Segment 2,310 19 7 Phoneme 5,404 5 2 Page-blocks 5,472 10 5 Texture 5,500 40 11 Satimage 6,435 36 6 Twonorm 7,400 20 2 Ring 7,400 20 2 PenBased 10,992 16 10 Magic 19,020 10 2
  • 67. Computation Time for 50,000 Generations Computation time was decreased to about 2% Name of Standard Our Model Percentage of B Data Set A minutes B minutes B/A (%) Segment 203.66 4.69 2.30% Phoneme 439.18 13.19 3.00% Page-blocks 204.63 4.74 2.32% Texture 766.61 15.72 2.05% Satimage 658.89 15.38 2.33% Twonorm 856.58 7.84 0.92% Ring 1015.04 22.52 2.22% PenBased 1520.54 35.56 2.34% Magic 771.05 22.58 2.93%
  • 68. Computation Time for 50,000 Generations Computation time was decreased to about 2% Name of Standard Our Model Percentage of B Data Set A minutes B minutes B/A (%) Segment 203.66 4.69 2.30% Why ? Phoneme 439.18 13.19 3.00% Because the population and the training Page-blockswere divided into 4.74 subsets. data 204.63 seven 2.32% Texture 766.61 15.72 2.05% 1/7 x 1/7 = 1/49 (about 2%) Satimage 658.89 15.38 2.33% Twonorm 856.58 7.84 0.92% Ring 1015.04 22.52 2.22% PenBased 1520.54 35.56 2.34% Magic 771.05 22.58 2.93%
  • 69. Computation Time for 50,000 Generations Computation time was decreased to about 2% Name of Standard Our Model Percentage of B Data Set A minutes B minutes B/A (%) Segment 203.66 4.69 2.30% Why ? Phoneme 439.18 13.19 3.00% Because the population and the training Page-blockswere divided into 4.74 subsets. data 204.63 seven 2.32% Texture 766.61 15.72 2.05% 1/7 x 1/7 = 1/49 (about 2%) Satimage 658.89 15.38 2.33% Twonorm 856.58 7.84 0.92% Ring 1015.04 22.52 2.22% PenBased 1520.54 35.56 2.34% Magic 771.05 22.58 2.93%
  • 70. Test Data Error Rates (Results of 3x10CV) Test data accuracy was improved for six data sets Name of Standard Our Model Improvement Data Set (A %) (B %) from A: (A - B)% Segment 5.99 5.90 0.09 Phoneme 15.43 15.96 - 0.53 Page-blocks 3.81 3.62 0.19 Texture 4.64 4.77 - 0.13 Satimage 15.54 12.96 2.58 Twonorm 7.36 3.39 3.97 Ring 6.73 5.25 1.48 PenBased 3.07 3.30 - 0.23 Magic 15.42 14.89 0.53
  • 71. Test Data Error Rates (Results of 3x10CV) Test data accuracy was improved for six data sets Name of Standard Our Model Improvement Data Set (A %) (B %) from A: (A - B)% Segment 5.99 5.90 0.09 Phoneme 15.43 15.96 - 0.53 Page-blocks 3.81 3.62 0.19 Texture 4.64 4.77 - 0.13 Satimage 15.54 12.96 2.58 Twonorm 7.36 3.39 3.97 Ring 6.73 5.25 1.48 PenBased 3.07 3.30 - 0.23 Magic 15.42 14.89 0.53
  • 72. Q. Why did our model improve the test data accuracy ? A. Because our model improved the search ability. Data Set Standard Our Model Improvement Satimage 15.54% 12.96% 2.58% Training Data Error Rate (%) Non-Parallel Non-Distributed Our Parallel Distributed Model Number of Generations
  • 73. Q. Why did our model improve the search ability ? A. Because our model maintained the diversity. Data Set Standard Our Model Improvement Satimage 15.54% 12.96% 2.58% Training Data Error Rate (%) The best and worst error rates in a particular subpopulation at each generation in a single run. Non-Parallel Non-Distributed Model (Best = Worst: No Diversity) Best Training Data Rotation: Every 100 Generations Worst Rule Set Migration: Parallel Distributed Model Every 100 Generations Number of Generations
  • 74. Effects of Rotation and Migration Intervals Training Data Rotation: Every 100 Generations Rule Set Migration: Every 100 Generations
  • 75. Effects of Rotation and Migration Intervals (Rotations in the opposite directions) Rule set migration CPU CPU CPU CPU CPU CPU CPU Population Training data rotation Training data No No
  • 76. Effects of Rotation and Migration Intervals (Rotations in the opposite directions) Rule set migration CPU CPU CPU CPU CPU CPU CPU Population 18 Good Results 16 14 Training data rotation Training data 12 10 20 50 100 200 20 500 50 100 200 ∞ No ∞ No 500
  • 77. Effects of Rotation and Migration Intervals (Rotations in the same direction) Rule set migration CPU CPU CPU CPU CPU CPU CPU Population Training data rotation Training data No No
  • 78. Effects of Rotation and Migration Intervals (Rotations in the same direction) Rule set migration CPU CPU CPU CPU CPU CPU CPU Population 18 16 14 Training data rotation Training data 12 10 Good Results 20 50 100 200 20 500 50 100 200 ∞ No ∞ No 500
  • 79. Effects of Rotation and Migration Intervals 18 18 16 16 14 14 12 12 10 10 20 20 50 50 100 100 200 200 20 20 500 50 500 50 100 100 200 200 ∞ No ∞ No 500 ∞ No ∞ No 500 Opposite Directions Same Direction
  • 80. Effects of Rotation and Migration Intervals Interaction between rotation and migration 18 18 16 16 14 14 12 12 10 10 20 20 50 50 100 100 200 200 20 20 500 50 500 50 100 100 200 200 ∞ No ∞ No 500 ∞ No ∞ No 500 Opposite Directions Same Direction
  • 81. Effects of Rotation and Migration Intervals Interaction between rotation and migration 18 18 16 16 Negative 14 14 Effects 12 12 10 10 20 20 50 50 100 100 200 200 20 20 500 50 500 50 100 100 200 200 ∞ No ∞ No 500 ∞ No ∞ No 500 Opposite Directions Same Direction
  • 82. Contents of This Presentation 1. Basic Idea of Evolutionary Computation Introduction Machine Learning 2. Genetics-Based 3. Parallel Distributed Implementation 4. Computation Experiments 5. Conclusion
  • 83. Conclusion 1. We explained our parallel distributed model. 2. It was shown that the computation time was decreased to 2%. Introduction 3. It was shown that the test data accuracy was improved. 4. We explained negative effects of the interaction between the training data rotation and the rule set migration.
  • 84. Conclusion 1. We explained our parallel distributed model. 2. It was shown that the computation time was decreased to 2%. Introduction 3. It was shown that the test data accuracy was improved. 4. We explained negative effects of the interaction between the training data rotation and the rule set migration.
  • 85. Conclusion 1. We explained our parallel distributed model. 2. It was shown that the computation time was decreased to 2%. Introduction 3. It was shown that the test data accuracy was improved. 4. We explained negative effects of the interaction between the training data rotation and the rule set migration.
  • 86. Conclusion 1. We explained our parallel distributed model. 2. It was shown that the computation time was decreased to 2%. Introduction 3. It was shown that the test data accuracy was improved. 4. We explained negative effects of the interaction between the training data rotation and the rule set migration.
  • 87. Conclusion 5. A little bit different model may be also possible for learning from locally located data bases. Introduction CPU CPU University A University C Subpopulation Rotation CPU CPU University B University D
  • 88. Conclusion 6. This model can be also used for learning from changing data bases. Introduction CPU CPU University A University C Subpopulation Rotation CPU CPU University B University D
  • 89. Conclusion 6. This model can be also used for learning from changing data bases. Introduction CPU CPU University A University C Subpopulation Rotation CPU CPU University B University D
  • 90. Conclusion 6. This model can be also used for learning from changing data bases. Introduction CPU CPU University A University C Subpopulation Rotation CPU CPU University B University D
  • 91. Conclusion 6. This model can be also used for learning from changing data bases. Introduction CPU CPU University A University C Subpopulation Rotation CPU CPU University B University D
  • 92. Conclusion 6. This model can be also used for learning from changing data bases. Introduction CPU CPU University A University C Subpopulation Rotation CPU CPU University B University D
  • 93. Conclusion Thank you very much!