This paper presents a learning methodology based on a substructural classification model to solve decomposable classification problems. The proposed method consists of three important components: (1) a structural model that represents salient interactions between attributes for a given data, (2) a surrogate model which provides a functional approximation of the output as a function of attributes, and (3) a classification model which predicts the class for new inputs. The structural model is used to infer the functional form of the surrogate and its coefficients are estimated using linear regression methods. The classification model uses a maximally-accurate, least-complex surrogate to predict the output for given inputs. The structural model that yields an optimal classification model is searched using an iterative greedy search heuristic. Results show that the proposed method successfully detects the interacting variables in hierarchical problems, group them in linkages groups, and build maximally accurate classification models. The initial results on non-trivial hierarchical test problems indicate that the proposed method holds promise and have also shed light on several improvements to enhance the capabilities of the proposed method.
Substructrual surrogates for learning decomposable classification problems: implementation and first results
1. Substructural Surrogates for Learning
Decomposable Classification Problems:
Implementation and First Results
Albert Orriols-Puig1,2 Kumara Sastry2
David E. Goldberg2 Ester Bernadó-Mansilla1
1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign
2. Motivation
Nearly decomposable functions in engineering systems
• (Simon,69; Gibson,79; Goldberg,02)
Design decomposition principle in:
– Genetic Algorithms (GAs)
• (Goldberg,02; Pelikan,05; Pelikan, Sastry & Cantú-Paz, 06)
– Genetic Programming (GPs)
• (Sastry & Goldberg, 03)
– Learning Classifier Systems (LCS)
• (Butz, Pelikan, Llorà & Goldberg, 06)
• (Llorà, Sastry, Goldberg & de la Ossa, 06)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 2
IWLCS’07
3. Aim
Our approach: build surrogates for classification.
– Probabilistic models Induce the form of a surrogate
– Regression Estimate the coefficients of the surrogate
– Use the surrogate to classify new examples
Surrogates used for efficiency enhancement of:
– GAs:
• (Sastry, Pelikan & Goldberg, 2004)
• (Pelikan & Sastry, 2004)
• (Sastry, Lima & Goldberg, 2006)
– LCSs:
• (Llorà, Sastry, Yu & Goldberg, 2007)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 3
IWLCS’07
4. Outline
1. Methodology
2. Implementation
3. Test Problems
4. Results
5. Discussion
6. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 4
IWLCS’07
5. 1. Methodology
2. Implementation
3. Test Problems
Overview of the Methodology 3. Results
4. Discussion
5. Conclusions
Structural Surrogate Classification
Model Layer Model Layer Model Layer
•Infer structural form •Use the surrogate
• Extract dependencies
and the coefficients function to predict the
between attributes
based on the class of new
• Expressed as:
structural model examples
• Linkage Groups
• Matrices (Sastry, Lima & Goldberg, 2006)
• Bayesian Networks
• Example: x-or
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 5
IWLCS’07
6. 1. Methodology
2. Implementation
3. Test Problems
Surrogate Model Layer 3. Results
4. Discussion
5. Conclusions
Input: Database of labeled examples with nominal attributes
Substructure identified by the structural model layer
– Example: [0 2] [1]
Surrogate form induced from the structural model:
f(x0, x1, x2) = cI + c0x0 + c1x1 + c2x2 + c0,2x0x2
Coefficients of surrogates: linear regression methods
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 6
IWLCS’07
7. 1. Methodology
2. Implementation
3. Test Problems
Surrogate Model Layer 3. Results
4. Discussion
5. Conclusions
Map input examples to schema-index matrix
– Consider all the schemas:
{0*0, 0*1, 1*0, 1*1, *0*, *1*}
[0 2] [1]
– In general, the number of schemas to be considered is:
Number of linkage
groups
Cardinality of the variable jth
of the ith linkage group
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 7
IWLCS’07
8. 1. Methodology
2. Implementation
3. Test Problems
Surrogate Model Layer 3. Results
4. Discussion
5. Conclusions
Map each example to a binary vector of size msch
– For each building block, the entry corresponding to the schemata contained
in the example is one. All the other are zero.
[0 2] [1]
0*0 0*1 1*0 1*1 *0* *1*
010 1 0 0 0 0 1
100 0 0 1 0 1 0
111 0 0 0 1 0 1
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 8
IWLCS’07
9. 1. Methodology
2. Implementation
3. Test Problems
Surrogate Model Layer 3. Results
4. Discussion
5. Conclusions
This results in the following matrix
n input
examples
Each class is mapped to an integer
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 9
IWLCS’07
10. 1. Methodology
2. Implementation
3. Test Problems
Surrogate Model Layer 3. Results
4. Discussion
5. Conclusions
Solve the following linear system to estimate the
coefficients of the surrogate model
Input instances mapped Class or label of each
to the schemas matrix input instance
Coefficients of the
surrogate model
To solve it, we use multi-dimensional least squares fitting
approach:
– Minimize:
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 10
IWLCS’07
11. 1. Methodology
2. Implementation
3. Test Problems
Overview of the Methodology 3. Results
4. Discussion
5. Conclusions
Structural Surrogate Classification
Model Layer Model Layer Model Layer
•Infer structural form •Use the surrogate
• Extract dependencies
and the coefficients function to predict the
between attributes
based on the class of new
• Expressed as:
structural model examples
• Linkage Groups
• Matrices (Sastry, Lima & Goldberg, 2006)
• Bayesian Networks
• Example: x-or
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 11
IWLCS’07
12. 1. Methodology
2. Implementation
3. Test Problems
Classification Model Layer 3. Results
4. Discussion
5. Conclusions
Use the surrogate to predict the class of a new input instance
Map the example
New input to a vector e of length msch Output = round (x·e)
example according to the schemas
Coefficients of the
surrogate model
0*0 0*1 1*0 1*1 *0* *1*
Output = x·(001010)’
001 0 0 1 0 1 0
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 12
IWLCS’07
13. 1. Methodology
2. Implementation
3. Test Problems
Overview of the Methodology 3. Results
4. Discussion
5. Conclusions
Structural Surrogate Classification
Model Layer Model Layer Model Layer
•Infer structural form •Use the surrogate
• Extract dependencies
and the coefficients function to predict the
between attributes
based on the class of new
• Expressed as:
structural model examples
• Linkage Groups
• Matrices (Sastry, Lima & Goldberg, 2006)
• Bayesian Networks
• Example: x-or
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 13
IWLCS’07
14. Outline
1. Methodology
2. Implementation
3. Test Problems
4. Results
5. Discussion
6. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 14
IWLCS’07
15. 1. Methodology
How do we obtain the structural 2. Implementation
3. Test Problems
3. Results
model? 4. Discussion
5. Conclusions
Greedy extraction of the structural model for classification
(gESMC): Greedy search à la eCGA (Harik, 98)
1. Start considering all variables independent. sModel = [1] [2] … [n]
2. Divide the dataset in ten-fold cross-validation sets: train and test
3. Build the surrogate with the train set and evaluate with the test set
4. While there is improvement
1. Create all combinations of two subset merges
Eg. { [1,2] .. [n]; [1,n] .. [2]; [1]..[2,n]}
2. Build the surrogate for all candidates. Evaluate the quality with the test set
3. Select the candidate with minimum error
4. If the quality of the candidate is better than the parent, accept the model
(update smodel) and go to 4.
5. Otherwise, stop. sModel is optimal.
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 15
IWLCS’07
16. Outline
1. Methodology
2. Implementation
3. Test Problems
4. Results
5. Discussion
6. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 16
IWLCS’07
17. 1. Methodology
2. Implementation
3. Test Problems
Test Problems 3. Results
4. Discussion
5. Conclusions
Two-level hierarchical problems (Butz, Pelikan, Llorà & Goldberg, 2006)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 17
IWLCS’07
18. 1. Methodology
2. Implementation
3. Test Problems
Test Problems 3. Results
4. Discussion
5. Conclusions
Problems in the lower level
Condition
length (l)
Position of the left-most
– Position Problem: 000110 :3 one-valued bit
Condition
length (l)
– Parity Problem: Number of 1 mod 2
01001010 :1
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 18
IWLCS’07
19. 1. Methodology
2. Implementation
3. Test Problems
Test Problems 3. Results
4. Discussion
5. Conclusions
Problems in the higher level
Condition
length (l)
Integer value of the input
– Decoder Problem: 000110 :5
Condition
length (l)
– Count-ones Problem: Number of ones of the input
000110 :2
Combinations of both levels:
– Parity + decoder (hParDec) - Parity + count-ones (hParCount)
– Position + decoder (hPosDec) - Position + count-ones (hPosCount)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 19
IWLCS’07
20. Outline
1. Methodology
2. Implementation
3. Test Problems
4. Results
5. Discussion
6. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 20
IWLCS’07
21. 1. Methodology
2. Implementation
3. Test Problems
Results with 2-bit lower level blocks 3. Results
4. Discussion
5. Conclusions
First experiment:
– hParDec, hPosDec, hParCount, hPosCount
– Binary inputs of length 17.
– The first 6 bits are relevant. The others are irrelevant
– 3 blocks of two-bits low order problems
– Eg: hPosDec
Interacting variables:
10 01 00 00101010101 [x0 x1] [x2 x3] [x4 x5]
irrelevant bits
210
Output = 21
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 21
IWLCS’07
22. 1. Methodology
2. Implementation
3. Test Problems
Results with 2-bit lower level blocks 3. Results
4. Discussion
5. Conclusions
Performance of gESMC compared to SMO and C4.5:
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 22
IWLCS’07
23. 1. Methodology
2. Implementation
3. Test Problems
Results with 2-bit lower level blocks 3. Results
4. Discussion
5. Conclusions
Form and coefficients of the surrogate models
– Interacting variables [x0 x1] [x2 x3] [x4 x5]
– Linkages between variables are detected
– Easy visualization of the salient variable interactions
– All models only consist of the six relevant variables
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 23
IWLCS’07
24. 1. Methodology
2. Implementation
3. Test Problems
Results with 2-bit lower level blocks 3. Results
4. Discussion
5. Conclusions
C4.5 created trees:
– HPosDec and HPosCount
• Trees of 53 nodes and 27 leaves on average
• Only the six relevant variables in the decision nodes
– HParDec
• Trees of 270 nodes and 136 leaves on average
• Irrelevant attributes in the decision nodes
– HParCount
• Trees of 283 nodes and 142 leaves on average
• Irrelevant attributes in the decision nodes
SMO built support vector machines:
– 351, 6, 6, and 28 machines with 17 weights each one were
created for HPosDec, HPosCount, HParDec, and HParCount
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 24
IWLCS’07
25. 1. Methodology
2. Implementation
3. Test Problems
Results with 3-bit lower level blocks 3. Results
4. Discussion
5. Conclusions
Second experiment:
– hParDec, hPosDec, hParCount, hPosCount
– Binary inputs of length 17.
– The first 6 bits are relevant. The others are irrelevant
– 2 blocks of three-bits low order problems
– Eg: hPosDec
Interacting variables:
000 100 00101010101 [x0 x1 x2] [x3 x4 x5]
irrelevant bits
0 3
Output = 3
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 25
IWLCS’07
26. 1. Methodology
2. Implementation
3. Test Problems
Results with 3-bit lower level blocks 3. Results
4. Discussion
5. Conclusions
Second experiment:
– 2 blocks of three-bits low order problems
Interesting observations:
•As expected, the method stops the search when no
M1: [0][1][2][3][4][5] improvement is found
M2: [0 1][2][3][4][5] •In HParDec, half of the times the system gets a
model that represents the salient interacting
M3: [0 1 2][3][4][5] variables due to the stochasticity of the 10-fold CV
estimation. Eg. [0 1 2 8] [3 4 9 5 6]
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 26
IWLCS’07
27. Outline
1. Methodology
2. Implementation
3. Test Problems
4. Results
5. Discussion
6. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 27
IWLCS’07
28. 1. Methodology
2. Implementation
3. Test Problems
Discussion 3. Results
4. Discussion
5. Conclusions
Lack of guidance from low-order substructures
– Increase the order of substructural merges
– Select randomly one of the new structural models
• Follow schemes such as simulated annealing (Korst & Aarts, 1997)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 28
IWLCS’07
29. 1. Methodology
2. Implementation
3. Test Problems
Discussion 3. Results
4. Discussion
5. Conclusions
Creating substructural models with overlapping
structures
– Problems with overlapping linkages
– Eg: multiplexer problems
– gESMC on the 6-bit mux results in: [0 1 2 3 4 5 ]
– What we would like to obtain is: [0 1 2] [0 1 3] [0 1 4] [0 1 5]
– Enhance the methodology permitting to represent overlapping
linkages, using methods such as the design structure matrix
genetic algorithm (DMSGA) (Yu, 2006)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 29
IWLCS’07
30. Outline
1. Methodology
2. Implementation
3. Test Problems
4. Results
5. Discussion
6. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 30
IWLCS’07
31. 1. Methodology
2. Implementation
3. Test Problems
Conclusions 3. Results
4. Discussion
5. Conclusions
Methodology for learning from decomposable problems
– Structural model
– Surrogate model
– Classification model
First implementation approach
– greedy search for the best structural model
Main advantage of gESMC over other learners
– It provides the structural model jointly with the classification
model.
Some limitations of gESMC:
– It depends on the guidance on the low-order structure
– Several approaches outlined as further work.
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 31
IWLCS’07
32. Substructural Surrogates for Learning
Decomposable Classification Problems:
Implementation and First Results
Albert Orriols-Puig1,2 Kumara Sastry2
David E. Goldberg2 Ester Bernadó-Mansilla1
1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign