Accounting for variance in machine learning benchmarks
Experimental comparison of ranking techniques
1. Experimental comparison of ranking techniques
The existing research on the field of MCDA ranking problems has been mainly focused on the
development of appropriate methodologies for supporting the decision making process in
multicriteria ranking problems. At the practical level, the use of MCDA ranking techniques in real-
world ranking problems has demonstrated the capabilities that this approach provides to decision
makers.
Nevertheless, the implementation in practice of any scientific development is always the last stage
of a research. Before this stage, experiments need to be performed in a laboratory environment,
under controlled data conditions in order to investigate the basic features on the scientific
development under consideration. Such an investigation and the corresponding experimental
analysis enable the derivation of useful conclusions on the potentials that the proposed research
has in practice and the possible problems that may be encountered during its practical
implementation (Doumpus and Zopounidis, 2002).
Within the field of MCDA experimental studies are rather limited. Some MCDA researchers
conducted experiments to investigate the features and peculiarities of some MCDA ranking and
choice methodologies (Stewart, 1993, 1996; Carmone et al., 1997; Zanakis et al., 1998).
Comparative studies involving MCDA ranking techniques have been heavily oriented towards the
AHP technique (Triantaphyllou, 2000).
The present paper follows this line of research to investigate the ranking performance of the
MOEA procedure presented in section X, as opposed to another widely used ranking method,
namely the NFR, which will be presented in section X. the investigation, is based on an extensive
simulation experiment.
The considered methods
Every study investigating the ranking performance of a new methodology relatively to other
thechniques, should consider techniques which are: well-established among researchers, the
considered techniques should consider different different underlying assumptions/functionality.
On the basis of these remarks, the experimental investigation of the ranking performance of the
MOEA procedure considers the ranking method NFR.
The NFR is among the most widely used ranking methods. Despite their shortcomings, even today
it is still almost always used in the exploitation phase of PROMETHEE II and ELECTRE III methods.
MOEA procedure has been developed as an alternative to NFR, following an evolutionary
algorithms approach.
2. Experimental design
The factors
The comparison of the MOEA procedure to the method NFR is performed through an extensive
simulation. The simulation approach provides a framework to conduct the comparison under
several data conditions and derive useful conclusions on the relative performance of the
considered methods given the features and properties of the data. The term performance refers
solely on the ranking accuracy of the methods.
The experiment presented in this paper is only concerned with the investigation of the ranking
accuracy of ranking methods on experimental data conditions. In particular, the conducted
experimental study investigates the performance of the methods on the basis of the following 2
factors.
F1: Ranking procedures
F2: Size of the ranking problems (cardinality of the set of decision alternatives)
Table XXX presents the levels considered for each factor in the simulation experiment.
Factors Levels
F1: Ranking procedures 1.- NFR
2.- MOEA procedure
F2: Size of the multicriteria ranking problems 1.- 6
2.- 8
3.- 10
4.- 12
5.- 18
Table XXX. Factors investigated in the experimental design
The methods defined by the factor F1 are compared (in terms of their ranking accuracy) under
different data conditions defined by the factor F2. Factor F2 is used to define the size of the
reference set(training sample) (the number of decision alternatives that it includes). The factor has
five levels corresponding to 6, 8, 10, 12, and 18 alternatives. Generally small training sample
contain limited information about the ranking problem being examined, but the corresponding
complexity of the problem is also limited. On the other hand, larger samples provide richer
information, but they also lead to increased complexity of the problem. Thus, the examination of
the five levels for this factor enables the investigation of the performance of the ranking
procedures under all these cases. This specification enables the derivation of useful conclusions on
the performance of the methods in a wide range of situations that are often met in practice (many
real-world ranking problems involve this number of decision alternatives).
Data generation procedure
3. An important aspect of the experimental comparison is the generation of the data having the
required properties defined by the factors described in the previous subsection.
In this study we proposed a methodology for the generation of the data. The general outline of
this methodology is presented in Appendix A . The outcome of this methodology is the generation
of a matrix and a vector consisting of a value outranking relation and the associated ranking of
alternatives which is consistent with the value outranking relation in terms of the test criterion of
section 3.
This experiment is repeated 5,000 times for the factor F2 (5 levels). Overall, 25,000 reference set
(Value Outranking Relation, Ranking) are considered. Each reference set is used to develop a
ranking through the methods specified by factor F1 (cf. Table XXX). This ranking is then applied to
the corresponding ranking of the reference set to test its generalizing ranking performance.
The simulation was conducted on a PC with processor Intel® Corel™ 2 Duo (2.20 GHz). Some
computer programs were written in Visual.Net programming environment in order to generate
simulated value outranking relations and rankings.
Analysis of results
The results obtained from the simulation experiment involve the ranking error rates of the
methods in the reference sets. The analysis that follows is focused on the ranking performance of
the methods. The error rates obtained using the reference sets provide an estimation of the
generalizing performance of the methods, measuring the ability of the methods to provide correct
recommendations on the ranking of alternatives.
A first important note on the obtained results is that the main effects regarding the factors F1 and
F2 are all significant. This clearly shows that each of these factors has a major impact on the
ranking performance of the methods.
For i) the percentage of times the two approaches (NFR and MOEA) yielded a different indication
of the best two and three alternatives, and ii) the number of times the two rankings derived from
NFR and MOEA were different from the “correct” ranking, the MOEA procedure provides the best
results. i.e. the MOEA procedure provides significantly the lower error rates. In the case of
expressing differences in ranking discrepancies, with regard to the number of times one method is
better than the other, the MOEA procedure provides considerably better results compared to the
NFR.
The interaction which is found significant in this experiment for the explanation of the differences
in the performance of the methods, involves the size of the reference set. The results of table XXX
4. show that the increase of the size of the reference set (number of alternatives) reduces the
performance of both methods. This is an expected result, since in this experiment larger reference
sets are associated with an increased complexity of the ranking problem. The most sensitive
method to the size of the reference set is NFR. Nevertheless, it should be noted that irrespective
of the reference set size the considered MOEA procedure always perform better than the NFR
method.
Summary of Major Findings
The experiment presented in this paper provides useful results regarding the efficiency of a MCDA
ranking method compared to another well established MCDA ranking method. Additionally, it
facilitated the investigation of the relative performance of the two MCDA ranking methods. The
conducted extensive experiment helped in considering the relative ranking performance of these
methods for a variety of data size.
Overall, the main finding of the experimental analysis presented in this paper can be summarized
in the following points:
1. The considered MCDA ranking method MOEA procedure can be considered as an efficient
alternative to the widely used NFR, at least in cases where the assumptions of these
techniques are not met in the data under consideration. Furthermore, the MOEA
procedure appears to be quite effective compared to other ranking methods. Of course, in
this analysis only the NFR method was considered. Therefore, the obtained results
regarding the comparison of MOEA procedure and other MCDA ranking methods should
be further extended considering a wider range of methods, such as Min in favour
(Bouyssou), extension of prudence principle (working paper Dias and Lamboray), etc. the
results of table xxx show that the MOEA procedure outperform, in all cases, the NFR
method. The high efficiency of the considered MCDA ranking method is also illustrated in
the results presented in table yyy. The analysis of Table xxx shows that the MOEA
procedure provides the lowest error rate in all cases. The results of Tables xxx and yyy lead
to the conclusion that the modeling framework of the MCDA ranking method MOEA
procedure is more efficient in addressing ranking problems than the NFR.
2. The test criterion proposed for evaluating ranking procedures and the procedure
proposed for generating value outranking relations and the associated ranking in
accordance with the test criterion, seems to be well-suited to the study of ranking
problems. Extending this procedure to consider the generation of more general value
outranking relations will contribute to a more complete analysis of a ranking method. This
will enable the modeling of the incomparability and intransitivities among pairs of
alternatives. Modeling such cases within an experimental study would be an interesting
5. further extension of this analysis in order to formulate a better view of the impact of the
test criterion on the ranking methods.
The experimental analysis presented in this paper did not address this issue. Instead, the focal
point of interest was the investigation of the ranking performance of the NFR and the MOEA
procedure. The obtained results can be considered as encouraging for the MOEA procedure.
Moreover, they provide the basis for further analysis along the lines of the above remarks.