SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE School Of
Architecture, Civil and Environmental Engineering
Semester Project in Civil Engineering
Enhancing the Serial Estimation of Discrete
Choice Models Sequences
by
Youssef Kitane
Under the direction of Prof. Michel Bierlaire and supervision of Nicola
Ortelli and Gael Lederrey in the Transport and Mobility Laboratory
Lausanne, June 2020
1
Contents
1 Introduction 3
2 Literature Review 4
3 Methodology 5
3.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Case Study 8
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Sequence of Discrete Choice Models . . . . . . . . . . . . . . . . . . . . . . . 8
5 Results 10
6 Conclusion 15
2
1 Introduction
Discrete Choice Models (DCMs) have played an essential role in transportation modeling
for the last 25 years [1]. Discrete choice modeling is a field designed to capture in detail the
underlying behavioral mechanisms at the foundation of the decision-making process that
drives consumers [2]. Because they must be behaviorally realistic while properly fitting
the data, appropriate utility specifications for discrete choice models are hard to develop.
In particular, modelers usually start by including a number of variables that are seen as
”essential” in the specification; these originate from their context knowledge or intuition.
Then, small changes are tested sequentially so as to improve the goodness of fit of the
model while ensuring its behavioral realism.The result is that many model specifications are
usually tested before the modeler is satisfied with the result. It, thus, leads to extensive
computational time since each model has to be optimized separately. A faster optimization
time would allow researchers to test many more specifications in the same amount of time.
In this project, the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm
is used to estimate the parameters of each DCM. Three techniques are implemented to
accelerate the process of estimating a sequence of DCMs:
Standardization (ST) of the variables: The goal is to change the values of numeric
columns in the dataset to a common scale, without distorting differences in the ranges
of values.
Warm Start(WS): This technique uses the knowledge acquired by the precedent model
to initialize the values of the parameters for the estimation of the next model.
Early Stopping (ES): This consists in stopping the estimation of a model earlier than
expected, based on how promising the improvement in log likelihood looks in the last
iterations of the optimization algorithm.
The next Section is dedicated to the literature review of the existent methods that speed up
an optimization process. Then, in Section 3, the three techniques are presented in detail for
a sequence of DCMs. Section 4 presents the data considered in this project, as well as the
sequences of models that we use to measure the effectiveness of the three techniques. Section
5 gathers the results obtained by the implemented methods. The last Section resumes the
findings of this project and highlights possible improvements and directions of research for
the future.
3
2 Literature Review
In large-scale convex optimization, first-order methods are methods of choice due to their
cheap iteration cost [3]. While second-order methods, such as the Newton method, are
making use of the curvature’s information, the cost of computing the Hessian can become a
hassle. Thus, quasi-Newton methods are a good compromise between curvature information
and low computation time. Indeed, they use an approximation of the Hessian instead of
its exact computation. The BFGS algorithm named after its inventors Broyen, Fletcher,
Goldfarb and Shannon [4] is one of the most well-known quasi-Newton methods. A new
method for solving linear systems is proposed [5]. The algorithm is specialized to invert
positive definite matrices in such a way that all iterates (approximate solutions) generated
by the algorithm are positive definite matrices themselves. This opens the way for many
applications in the field of optimization. The accelerated matrix inversion algorithm was
then incorporated into an optimization framework to develop both accelerated stochastic
and deterministic BFGS, which to the best of our knowledge, are the first accelerated quasi-
Newton updates. Under a careful choice of the parameters of the method, and depending on
the problem structure and conditioning, acceleration might result into significant speedups
both for the matrix inversion problem and for the stochastic BFGS algorithm. It is con-
firmed experimentally that these accelerated methods can lead to speed-ups when compared
to the classical BFGS algorithm, but no convergence analysis is yet provided.
The increase in the size of choice modeling datasets in recent years has led to a growing
interest in research to accelerate the estimation of DCMs. Researchers have used techniques
to speed-up the estimation of one DCM based on Machine Learning (ML) techniques [6].
It is achieved by proposing new efficient stochastic optimization algorithms and extensively
testing them alongside existing approaches. These algorithms are developed based on three
main contributions: the use of a stochastic Hessian, the modification of the batch size, and
a change of optimization algorithm depending on the batch size. This paper shows that
the use of a second-order method and a small batch size is a good starting point for DCM
optimization. It also shows that BFGS is an algorithm that works particularly well when
the said starting point has been found.
The problem of initializing a parameter in a model is central in ML. One particularly com-
mon scenario is where a ML algorithm must be constantly updated with new data. This
situation occurs generally in finance, online advertising, recommendation systems, fraud
detection, and many other domains where machine learning systems are used for prediction
and decision making in the real world [7]. When new data arrive, the model needs to be
updated so that it can be as accurate as possible. While the majority of existing methods
start the configuration process of an algorithm from scratch by initializing randomly the
parameters, it is possible to exploit information previously learned in order to ”warm start”
its configuration on new type of configurations.
In most common optimization algorithms and more precisely in ML, the modeler decides to
stop the optimization procedure before reaching the required tolerance in the solution [8].
Stopping earlier an optimization process is a trick used to control the generalization per-
formance of the ongoing model during the training phase and avoid over-fitting in the test
phase. In discrete choice modeling,the main objective is not to have the highest accuracy
but parameters that are behaviorally realistic.
4
3 Methodology
This section introduces briefly the principles underlying the BFGS algorithm used to esti-
mate a sequence of DCMs before presenting the techniques used to speed-up the estimation
of a sequence of DCMs .
As a reminder, the iterates {xj} of a line search optimization method following a descent
direction dj and a step size αj are defined as follows :
xj+1 = xj + αjdj (1)
where the direction of descent is obtained by preconditioning the gradient and is defined
as :
dj = −Dj f(xj) (2)
assuming that the matrix Dj at the iterate xj is semi-definite positive.
For quasi-Newton methods, Dj is an approximation of the Hessian. A slightly different
version of BFGS consists in approximating the inverse of the Hessian. The BFGS−1
algo-
rithm uses the following approximation [9] :
D−1
j+1 = D−1
j +
(sT
j yj + yT
j D−1
j yj)(sjsT
j )
(sT
j yj)2
−
D−1
j yjsT
j + sjyT
j D−1
j
sT
j yj
(3)
where sj = xj+1 − xj and yj = f(xj+1) − f(xj)
The step is calculated with an inexact line search method, based on the two Wolfe con-
ditions (Wolfe, 1969, 1971). The first condition, also known as the Armijo rule, guarantees
that the step gives a sufficient decrease in the objective function. The second condition,
known as the curvature condition, prevents the step length from being too short.
3.1 Standardization
The concept of standardization is relevant when continuous independent variables are mea-
sured at different scales. Indeed, standardization is a technique often applied as part of data
preparation for ML. The goal is to change the values of numeric columns in the dataset to
a common scale, without distorting differences in the ranges of values. More formally, let’s
suppose that a variable x takes values from the set S = {x1, x2, ...., xn}. The process of
standardization of one variable xi in S is applied as follows :
xi =
xi − ¯x
σ
(4)
where ¯x is the mean of the values in S and σ is the respective standard deviation.
5
3.2 Warm Start
A method commonly used in the field of ML is the warm start which consist on initializing
a set of parameters by non arbitrary parameters. In our case, we initialize the variables of
a model by the variables estimated by the BFGS algorithm in the previous model.
Formally, we define the set of parameters in model j as xj ∈ RNj
where Nj corresponds
to the number of parameters in model j. The set of parameters for the following model is
defined similarly, i.e. xj+1 ∈ RNj+1
where Nj+1 corresponds to the number of parameters of
this model. To generate the initial variables of model j + 1, i.e. x0
j+1, we use the optimized
variable of the previous model, i.e. x∗
j . We thus define the initialization of x0
j+1 for each
index i ∈ {1, . . . , Nj+1} such that:
x0
j+1,i =
x∗
j,i if i ∈ {1, . . . , Nj} ,
0 otherwise.
In the case where a variable is transformed using a non-linear function, such as a Box-Cox
transformation, we propose to use a slightly updated version of the warm-start. First, we
have to define a boolean array, B ∈ {0, 1}
N
j+1, such that:
Bi =
True if xj+1,i has been transformed non linearly,
False otherwise.
x0
j+1,i =
x∗
j,i if i ∈ {1, . . . , Nj} and Bi is False,
0 otherwise.
This allows us to reset the value of a variables associated with a non-linear transformation
to 0 instead of using the previous optimized value.
The same procedure is used for the initialization of the Hessian between the model j and
the following model j+1. We thus define the initialization of H0
j+1 for each combination of
indexes i, k ∈ {1, . . . , Nj+1} such that:
H0
j+1,(i,k) =



H∗
j,(i,k) if i, k ∈ {1, . . . , Nj} ,
1 if i = k
0 otherwise.
3.3 Early Stopping
The early stopping method consists in stopping the estimation process before the conver-
gence is achieved. Because the objective is to select the best model among a sequence of
DCMs, the log likelihood evaluation f(xi) obtained at an epoch is compared to the lowest
log likelihood LLbest of all the previous models. For a model i, if at a certain epoch f(xi) is
lower than the LLbest,the optimization is pursued until the end. The new best log likelihood
is now equal to the estimated value by the BFGS algorithm of the log likelihood LLopt. If
at certain epoch f(xi) is higher than the LLbest, the optimization process is stopped based
on some criterion. This criterion estimates the relative evolution of the function in order to
6
detect a plateau, it means that the function is no longer experiencing a significant improve-
ment. Three evaluations of the function are considered in order to be sure of the convergence
of the log likelihood.
Let’s suppose that we have access to the last three evaluations of the log-likelihood f(xi),
f(xi−1), f(xi−2) during the estimation process of one model. Is it possible to assess that stag-
nation by evaluating the two following ratios and compare them to a predefined threshold
ε :
f(xi−1)
f(xi)
< ε
f(xi−2)
f(xi−1)
< ε
Even though the goal of the early stopping is to reduces the estimation time of DCMs,it
is important to keep in mind that an important difference between the solution obtained
by applying the early stopping to the BFGS algorithm and the standard BFGS should not
arise. For example, Figure 3 shows the value of the log likelihood during the estimation
process of a random model. As we can see in this example, there is a stagnation in the
middle of the estimation. We do not want to do an early stopping at this moment since the
estimation is far from being finished. We thus have to be careful with the threshold and
make a sensitivity analysis on this parameter.
Figure 1: Example : Difference between a possible stagnation of the log likelihood and the
real convergence
7
4 Case Study
4.1 Dataset
The Swissmetro dataset Bierlaire et al. (2001) corresponds to survey data collected in
Switzerland between St Gall and Geneva in Switzerland during march 1998. In that sense,
it was used to study the market penetration of the Swissmetro, a revolutionary mag-lev
under-ground system. Three alternatives - train,car and swissmetro - were generated for
each of the 1192 respondents. A sample of 10’728 observations were obtained by generating
9 types of situations. The pre-selected attributes of the alternatives are for some categorical
(travel card ownership, gender, type of luggage, etc.) and for others continuous (travel time,
cost and headway).
4.2 Sequence of Discrete Choice Models
For the purpose of this project, two sequences of hundred DCMs respectively denoted by S1
and S2, are considered. Each sequence starts with a given choice model. Then, a random
perturbation is applied. These small modifications corresponds to the typical elementary
perturbations that are used to alternate from one model to another. In fact, six types of
modifications are considered :
• Adds a non selected variable to enter the utility of an alternative
• Removes a variable from the utility of an alternative
• Increments the Box-Cox parameter of a given variable
• Decrements the Box-Cox parameter of a given variable
• Interacts a variable with a socioeconomic variable
• Deactivate the interaction of the considered variable with a socioeconmic variable
The first sequence S1 begins with an alternative specific constant model and the com-
plexity increases while the sequence S2 start with a random model and the complexity is
approximately constant along the hundred models. The number of parameters for each
sequence of DCMs is shown in the Figure 4 :
8
Figure 2: Number of parameters for the two sequences S1 and S2.
9
5 Results
In order to avoid misunderstandings, abbreviations are given to the different methods. The
base method estimates the parameters without applying any warm start and is denoted by
Base. The warm start of the parameters is denoted by WSx, the warm start that concerns
only variables non linearly transformed by WSbc, the warm start of the Hessian by WSh
and the combination of the warm start of the Hessian and the variables by WS.
A benchmark of ten estimation for the methods Base, WS, WSx, WSh, WSbc and ST
is conducted for the sequences S1 and S2. The Tables 1 and 2 presents a summary of the
statistics for the methods previously mentioned. The lowest, highest, mean time and the
standard deviation among the ten estimations are reported. The speedup corresponds to
the ratio between the mean time of each method and the mean time of the Base method.
Among the five types of warm start Base, WS, WSx, WSh, WSbc, the WS is the most
efficient and reliable method. Indeed, it permits to speedup the estimation time by a fac-
tor of 3.84 for S1 and 4.5 for S2. The observed standard deviation for WS is the lowest
compared to the other methods. A standard deviation of 0.18 and 0.35 for respectively
S1 and S2 are obtained. For the sequence S1 and S2,The WSh is also an efficient method
because it permits to accelerate the estimation time by a factor respectively equal to 2.2 and
2.56. The WSx is not as performing as the WS and WSh,it reduces the estimation time by
respectively 19 % and 15 % for S1 and S2. The WSbc does not reduces the estimation time
of the Base method. It appears that the ST method of the variables is efficient. For the
first sequence S1, the estimation time is reduced from 229.51 s to 203.28 s.Concerning the
sequence S2, a reduction of 11 % of the estimation time is obtained. The standardization
of the variables is not as effective as the WS method but permits to have interesting results
and should be applied beforehand for every sequence of DCMs that presents variables with
differences in the range of values.
Table 1: Summary of statistics for 10 estimations by method for the sequence S1
Statistics Base WS WSx WSh WSbc ST
Minimum 224.90 60.86 187.74 104.99 225.80 198.10
Maximum 231.32 61.45 189.49 106.07 231.85 203.28
Mean 229.51 61.07 188.73 105.44 229.80 201.72
Standard Deviation 1.78 0.18 0.5 0.29 1.65 1.35
Speedup 1.0 3.84 1.21 2.2 0.99 1.15
10
Table 2: Summary of statistics for 10 estimations by method for the sequence S2
Statistics Base WS WSx WSh WSbc ST
Minimum 534.58 119.18 459.26 210.35 529.25 478.36
Maximum 539.44 120.35 461.36 212.46 538.44 485.93
Mean 536.23 119.74 460.34 211.27 535.84 480.96
Standard Deviation 1.36 0.35 0.72 0.60 2.47 2.16
Speedup 1.0 4.5 1.17 2.56 1.01 1.12
A sensitivity analysis is conducted for The ES method. A sequence of 20 thresholds
ranging from 10−7
to 5*10−4
is used in order to test the performance of the ES method
compared to the Base method. The Figures 3 and 5 presents the relative estimation time
of the ES method for the 20 thresholds compared to the Base method for respectively S1
and S2. The black line correspond to the mean time observed for the Base method for
10 estimations. The grey lines represents a confidence interval of 95 % around the mean
estimation time of the Base method. A box plot with a confidence interval of 95 % for
every threshold is plotted. For the sequence S1, a restrictive threshold of 10−7
leads to a
speed up of 3 %, while for S2 a speed up for approximately 4% is observed. Is it possible
to obtain a better speed up when increasing the value of the threshold. Indeed, a reduction
of 35 % and 15 % of the optimization is obtained for respectively S1 and S2 when a less
restrictive threshold of 0.0005 is used.
Figure 3: Sensitivity analysis of the threshold parameter for S1
11
Figure 4: Sensitivity analysis of the threshold parameter for S2
In order to select the best threshold, the improvement of the estimation time is not
the only criterion that should be taken into account. Even though the number of models
stopped earlier increases as long as the threshold increase and the total optimization time
decreases,the main drawback is that the method could stop at a plateau that is far away
from the real plateau of convergence of the log likelihood. These models are falsely stopped
earlier and should be distinguished from the models that have reached the real plateau of
convergence as explained in the Figure 1. The Figures 5 and 6 shows that from a certain
threshold, some models are falsely stopped. Indeed, a threshold of 0.001 leads to 6 models
among the 76 models that don’t reach the real convergence of the log likelihood for the
sequence S1. Concerning the sequence S2, a less restrictive threshold of 0.003 stopped
falsely 3 models among the 90 models stopped earlier. Even though the main objective is
to speed up the optimization time of a sequence of models and higher thresholds leads to
lower optimization time, the modeler has to be careful with models that are falsely stopped
earlier. A threshold of 2*10−5
is acceptable in the sense that no model is falsely stopped
earlier for both S1 and S2 and gives a speed up performance that is equivalent to less
restrictive thresholds.
12
Figure 5: Number of models falsely stopped earlier for S1
Figure 6: Number of models falsely stopped earlier for S2
13
A benchmark that regroups the methods that speed up the estimation time of both S1
and S2 is launched. Among the five warm starts, the WS is the the most efficient. The ST
must also be taken into account. The ES with a threshold of 2*10−5
method has shown an
interesting reduction of time. The obtained results of the combination of all this methods
for both S1 and S2 are compared to the Base method and presented in the Tables 3 and
4. The combination of the WS, ES with a threshold of 2*10−5
, ST methods leads to an
improvement of a factor 5.26 and 6.67 compared to the Base method for respectively S1
and S2.
Table 3: Summary of statistics for 10 estimations for the sequence S1 : Comparison between
the regrouped performing methods and the Base method
Statistics Base Final
Minimum 224.90 44.25
Maximum 231.32 44.83
Mean 229.51 44.45
Standard Deviation 1.78 0.18
Speedup 1.0 5.26
Table 4: Summary of statistics for 10 estimations for the sequence S1 : Comparison between
the regrouped performing methods and the Base method
Statistics Base Final
Minimum 534.58 83.77
Maximum 539.44 84.55
Mean 536.23 84.22
Standard Deviation 1.36 0.23
Speedup 1.0 6.67
14
6 Conclusion
Enhancing the estimation of a sequence of DCMs is a subject that has not yet been explored.
The objective of this project was to propose different methods to improve the total estima-
tion time of a sequence of DCMs. Indeed, the BFGS−1
algorithm is used to estimate the
two sequence of DCMs S1 and S2. The first approach was to implement a commonly used
method in ML, which uses the knowledge acquired before to use it for a new task. The WS
method permits to speedup the estimation time by a factor of 3.84 and 4.5 compared to the
Base method for respectively S1 and S2. The standardization of the variables accelerate
slowly the estimation time and should be used at the beginning of every optimization task
because of its simplicity. The last approach is the ES method which has shown interesting
improvement of the estimation time but the applied threshold has to be carefully chosen in
order not stop at a bad convergence plateau of the log likelihood. A threshold of 2*10−5
is chosen. The combination of all the methods that speedup the estimation time of both
S1 and S2 leads to an improvement of respectively 5.26 and 6.67 compared to the Base
method. For the future, I would like to work on two improvements. The first one concerns
the ES method. The ES method tends to stop at a plateau of convergences that could be
far away from the real convergence of the log likelihood. A more robust ES method could be
implemented by finding an efficient way to detect these regions of convergence. The second
possible improvement concerns the warm start. A more detailed analysis of the warm start
could be done. Even though, the total estimation time is reduced by the WS method, some
models were this method is applied have an optimization time higher than the case where
no warm start is applied.
15
References
[1] Bierlaire M. (1998) Discrete Choice Models. In: Labb´e M., Laporte G., Tanczos K.,
Toint P. (eds) Operations Research and Decision Aid Methodologies in Traffic and Trans-
portation Management. NATO ASI Series (Series F: Computer and Systems Sciences),
vol 166. Springer, Berlin, Heidelberg
[2] Ben-Akiva M., Bierlaire M. (1999) Discrete Choice Methods and their Applications to
Short Term Travel Decisions. In: Hall R.W. (eds) Handbook of Transportation Science.
International Series in Operations Research & Management Science, vol 23. Springer,
Boston, MA
[3] Devolder, O., Glineur, F. & Nesterov, Y. First-order methods of smooth convex opti-
mization with inexact oracle. Math. Program. 146, 37–75 (2014).
https://doi.org/10.1007/s10107-013-0677-5
[4] Henning, P. and Kiefel, M. (2013). Quasi-newton methods: A new direction. The Journal
of Machine Learning Research,14(1):843-865
[5] Robert M. Gower and Filip Hanzely and Peter Richt´arik and Sebastian Stich (2018).
Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules
for Faster Second-Order Optimization.
[6] Lederrey G., Lurkin V. and Hillel T. and Bierlaire M (2018). Estimation of Discrete
Choice Models with Hybrid Stochastic Adaptive Batch Size Algorithms.
[7] Jordan T. Ash and Ryan P. Adams (2019). On the Difficulty of Warm-Starting Neural
Network Training.
[8] Prechelt L. (2012). Early Stopping — But When?
[9] Fletcher, R. (1987). Practical Methods of Optimization; (2Nd Ed.). Wiley-Interscience,
New York, NY, USA.
16

Mais conteúdo relacionado

Mais procurados

Financial Time Series Analysis Based On Normalized Mutual Information Functions
Financial Time Series Analysis Based On Normalized Mutual Information FunctionsFinancial Time Series Analysis Based On Normalized Mutual Information Functions
Financial Time Series Analysis Based On Normalized Mutual Information FunctionsIJCI JOURNAL
 
A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...ijcsit
 
Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...IJECEIAES
 
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...IJCI JOURNAL
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansEditor IJCATR
 
Consistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum ThresholdingConsistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum ThresholdingCSCJournals
 
Computational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineeringComputational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineeringSpringer
 
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...CSCJournals
 
Iee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veraciniIee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veracinigrssieee
 
A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...IJECEIAES
 
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGDATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGijccsa
 
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...TELKOMNIKA JOURNAL
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...André Gonçalves
 
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...CSCJournals
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
 

Mais procurados (19)

Financial Time Series Analysis Based On Normalized Mutual Information Functions
Financial Time Series Analysis Based On Normalized Mutual Information FunctionsFinancial Time Series Analysis Based On Normalized Mutual Information Functions
Financial Time Series Analysis Based On Normalized Mutual Information Functions
 
A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...
 
Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...
 
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K Means
 
Feature Selection
Feature Selection Feature Selection
Feature Selection
 
17Vol71No1
17Vol71No117Vol71No1
17Vol71No1
 
Consistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum ThresholdingConsistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
 
13Vol70No2
13Vol70No213Vol70No2
13Vol70No2
 
Computational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineeringComputational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineering
 
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...
 
Iee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veraciniIee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veracini
 
A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...
 
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGDATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
 
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
 
Ica 2013021816274759
Ica 2013021816274759Ica 2013021816274759
Ica 2013021816274759
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...
 

Semelhante a Sequential estimation of_discrete_choice_models__copy_-4

COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...IAEME Publication
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...Editor IJCATR
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...cscpconf
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...ijfls
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...cscpconf
 

Semelhante a Sequential estimation of_discrete_choice_models__copy_-4 (20)

COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
master-thesis
master-thesismaster-thesis
master-thesis
 
C013141723
C013141723C013141723
C013141723
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...
 
Algorithms 14-00122
Algorithms 14-00122Algorithms 14-00122
Algorithms 14-00122
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
 

Último

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 

Último (20)

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Sequential estimation of_discrete_choice_models__copy_-4

  • 1. ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE School Of Architecture, Civil and Environmental Engineering Semester Project in Civil Engineering Enhancing the Serial Estimation of Discrete Choice Models Sequences by Youssef Kitane Under the direction of Prof. Michel Bierlaire and supervision of Nicola Ortelli and Gael Lederrey in the Transport and Mobility Laboratory Lausanne, June 2020 1
  • 2. Contents 1 Introduction 3 2 Literature Review 4 3 Methodology 5 3.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Case Study 8 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 Sequence of Discrete Choice Models . . . . . . . . . . . . . . . . . . . . . . . 8 5 Results 10 6 Conclusion 15 2
  • 3. 1 Introduction Discrete Choice Models (DCMs) have played an essential role in transportation modeling for the last 25 years [1]. Discrete choice modeling is a field designed to capture in detail the underlying behavioral mechanisms at the foundation of the decision-making process that drives consumers [2]. Because they must be behaviorally realistic while properly fitting the data, appropriate utility specifications for discrete choice models are hard to develop. In particular, modelers usually start by including a number of variables that are seen as ”essential” in the specification; these originate from their context knowledge or intuition. Then, small changes are tested sequentially so as to improve the goodness of fit of the model while ensuring its behavioral realism.The result is that many model specifications are usually tested before the modeler is satisfied with the result. It, thus, leads to extensive computational time since each model has to be optimized separately. A faster optimization time would allow researchers to test many more specifications in the same amount of time. In this project, the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm is used to estimate the parameters of each DCM. Three techniques are implemented to accelerate the process of estimating a sequence of DCMs: Standardization (ST) of the variables: The goal is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. Warm Start(WS): This technique uses the knowledge acquired by the precedent model to initialize the values of the parameters for the estimation of the next model. Early Stopping (ES): This consists in stopping the estimation of a model earlier than expected, based on how promising the improvement in log likelihood looks in the last iterations of the optimization algorithm. The next Section is dedicated to the literature review of the existent methods that speed up an optimization process. Then, in Section 3, the three techniques are presented in detail for a sequence of DCMs. Section 4 presents the data considered in this project, as well as the sequences of models that we use to measure the effectiveness of the three techniques. Section 5 gathers the results obtained by the implemented methods. The last Section resumes the findings of this project and highlights possible improvements and directions of research for the future. 3
  • 4. 2 Literature Review In large-scale convex optimization, first-order methods are methods of choice due to their cheap iteration cost [3]. While second-order methods, such as the Newton method, are making use of the curvature’s information, the cost of computing the Hessian can become a hassle. Thus, quasi-Newton methods are a good compromise between curvature information and low computation time. Indeed, they use an approximation of the Hessian instead of its exact computation. The BFGS algorithm named after its inventors Broyen, Fletcher, Goldfarb and Shannon [4] is one of the most well-known quasi-Newton methods. A new method for solving linear systems is proposed [5]. The algorithm is specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization. The accelerated matrix inversion algorithm was then incorporated into an optimization framework to develop both accelerated stochastic and deterministic BFGS, which to the best of our knowledge, are the first accelerated quasi- Newton updates. Under a careful choice of the parameters of the method, and depending on the problem structure and conditioning, acceleration might result into significant speedups both for the matrix inversion problem and for the stochastic BFGS algorithm. It is con- firmed experimentally that these accelerated methods can lead to speed-ups when compared to the classical BFGS algorithm, but no convergence analysis is yet provided. The increase in the size of choice modeling datasets in recent years has led to a growing interest in research to accelerate the estimation of DCMs. Researchers have used techniques to speed-up the estimation of one DCM based on Machine Learning (ML) techniques [6]. It is achieved by proposing new efficient stochastic optimization algorithms and extensively testing them alongside existing approaches. These algorithms are developed based on three main contributions: the use of a stochastic Hessian, the modification of the batch size, and a change of optimization algorithm depending on the batch size. This paper shows that the use of a second-order method and a small batch size is a good starting point for DCM optimization. It also shows that BFGS is an algorithm that works particularly well when the said starting point has been found. The problem of initializing a parameter in a model is central in ML. One particularly com- mon scenario is where a ML algorithm must be constantly updated with new data. This situation occurs generally in finance, online advertising, recommendation systems, fraud detection, and many other domains where machine learning systems are used for prediction and decision making in the real world [7]. When new data arrive, the model needs to be updated so that it can be as accurate as possible. While the majority of existing methods start the configuration process of an algorithm from scratch by initializing randomly the parameters, it is possible to exploit information previously learned in order to ”warm start” its configuration on new type of configurations. In most common optimization algorithms and more precisely in ML, the modeler decides to stop the optimization procedure before reaching the required tolerance in the solution [8]. Stopping earlier an optimization process is a trick used to control the generalization per- formance of the ongoing model during the training phase and avoid over-fitting in the test phase. In discrete choice modeling,the main objective is not to have the highest accuracy but parameters that are behaviorally realistic. 4
  • 5. 3 Methodology This section introduces briefly the principles underlying the BFGS algorithm used to esti- mate a sequence of DCMs before presenting the techniques used to speed-up the estimation of a sequence of DCMs . As a reminder, the iterates {xj} of a line search optimization method following a descent direction dj and a step size αj are defined as follows : xj+1 = xj + αjdj (1) where the direction of descent is obtained by preconditioning the gradient and is defined as : dj = −Dj f(xj) (2) assuming that the matrix Dj at the iterate xj is semi-definite positive. For quasi-Newton methods, Dj is an approximation of the Hessian. A slightly different version of BFGS consists in approximating the inverse of the Hessian. The BFGS−1 algo- rithm uses the following approximation [9] : D−1 j+1 = D−1 j + (sT j yj + yT j D−1 j yj)(sjsT j ) (sT j yj)2 − D−1 j yjsT j + sjyT j D−1 j sT j yj (3) where sj = xj+1 − xj and yj = f(xj+1) − f(xj) The step is calculated with an inexact line search method, based on the two Wolfe con- ditions (Wolfe, 1969, 1971). The first condition, also known as the Armijo rule, guarantees that the step gives a sufficient decrease in the objective function. The second condition, known as the curvature condition, prevents the step length from being too short. 3.1 Standardization The concept of standardization is relevant when continuous independent variables are mea- sured at different scales. Indeed, standardization is a technique often applied as part of data preparation for ML. The goal is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. More formally, let’s suppose that a variable x takes values from the set S = {x1, x2, ...., xn}. The process of standardization of one variable xi in S is applied as follows : xi = xi − ¯x σ (4) where ¯x is the mean of the values in S and σ is the respective standard deviation. 5
  • 6. 3.2 Warm Start A method commonly used in the field of ML is the warm start which consist on initializing a set of parameters by non arbitrary parameters. In our case, we initialize the variables of a model by the variables estimated by the BFGS algorithm in the previous model. Formally, we define the set of parameters in model j as xj ∈ RNj where Nj corresponds to the number of parameters in model j. The set of parameters for the following model is defined similarly, i.e. xj+1 ∈ RNj+1 where Nj+1 corresponds to the number of parameters of this model. To generate the initial variables of model j + 1, i.e. x0 j+1, we use the optimized variable of the previous model, i.e. x∗ j . We thus define the initialization of x0 j+1 for each index i ∈ {1, . . . , Nj+1} such that: x0 j+1,i = x∗ j,i if i ∈ {1, . . . , Nj} , 0 otherwise. In the case where a variable is transformed using a non-linear function, such as a Box-Cox transformation, we propose to use a slightly updated version of the warm-start. First, we have to define a boolean array, B ∈ {0, 1} N j+1, such that: Bi = True if xj+1,i has been transformed non linearly, False otherwise. x0 j+1,i = x∗ j,i if i ∈ {1, . . . , Nj} and Bi is False, 0 otherwise. This allows us to reset the value of a variables associated with a non-linear transformation to 0 instead of using the previous optimized value. The same procedure is used for the initialization of the Hessian between the model j and the following model j+1. We thus define the initialization of H0 j+1 for each combination of indexes i, k ∈ {1, . . . , Nj+1} such that: H0 j+1,(i,k) =    H∗ j,(i,k) if i, k ∈ {1, . . . , Nj} , 1 if i = k 0 otherwise. 3.3 Early Stopping The early stopping method consists in stopping the estimation process before the conver- gence is achieved. Because the objective is to select the best model among a sequence of DCMs, the log likelihood evaluation f(xi) obtained at an epoch is compared to the lowest log likelihood LLbest of all the previous models. For a model i, if at a certain epoch f(xi) is lower than the LLbest,the optimization is pursued until the end. The new best log likelihood is now equal to the estimated value by the BFGS algorithm of the log likelihood LLopt. If at certain epoch f(xi) is higher than the LLbest, the optimization process is stopped based on some criterion. This criterion estimates the relative evolution of the function in order to 6
  • 7. detect a plateau, it means that the function is no longer experiencing a significant improve- ment. Three evaluations of the function are considered in order to be sure of the convergence of the log likelihood. Let’s suppose that we have access to the last three evaluations of the log-likelihood f(xi), f(xi−1), f(xi−2) during the estimation process of one model. Is it possible to assess that stag- nation by evaluating the two following ratios and compare them to a predefined threshold ε : f(xi−1) f(xi) < ε f(xi−2) f(xi−1) < ε Even though the goal of the early stopping is to reduces the estimation time of DCMs,it is important to keep in mind that an important difference between the solution obtained by applying the early stopping to the BFGS algorithm and the standard BFGS should not arise. For example, Figure 3 shows the value of the log likelihood during the estimation process of a random model. As we can see in this example, there is a stagnation in the middle of the estimation. We do not want to do an early stopping at this moment since the estimation is far from being finished. We thus have to be careful with the threshold and make a sensitivity analysis on this parameter. Figure 1: Example : Difference between a possible stagnation of the log likelihood and the real convergence 7
  • 8. 4 Case Study 4.1 Dataset The Swissmetro dataset Bierlaire et al. (2001) corresponds to survey data collected in Switzerland between St Gall and Geneva in Switzerland during march 1998. In that sense, it was used to study the market penetration of the Swissmetro, a revolutionary mag-lev under-ground system. Three alternatives - train,car and swissmetro - were generated for each of the 1192 respondents. A sample of 10’728 observations were obtained by generating 9 types of situations. The pre-selected attributes of the alternatives are for some categorical (travel card ownership, gender, type of luggage, etc.) and for others continuous (travel time, cost and headway). 4.2 Sequence of Discrete Choice Models For the purpose of this project, two sequences of hundred DCMs respectively denoted by S1 and S2, are considered. Each sequence starts with a given choice model. Then, a random perturbation is applied. These small modifications corresponds to the typical elementary perturbations that are used to alternate from one model to another. In fact, six types of modifications are considered : • Adds a non selected variable to enter the utility of an alternative • Removes a variable from the utility of an alternative • Increments the Box-Cox parameter of a given variable • Decrements the Box-Cox parameter of a given variable • Interacts a variable with a socioeconomic variable • Deactivate the interaction of the considered variable with a socioeconmic variable The first sequence S1 begins with an alternative specific constant model and the com- plexity increases while the sequence S2 start with a random model and the complexity is approximately constant along the hundred models. The number of parameters for each sequence of DCMs is shown in the Figure 4 : 8
  • 9. Figure 2: Number of parameters for the two sequences S1 and S2. 9
  • 10. 5 Results In order to avoid misunderstandings, abbreviations are given to the different methods. The base method estimates the parameters without applying any warm start and is denoted by Base. The warm start of the parameters is denoted by WSx, the warm start that concerns only variables non linearly transformed by WSbc, the warm start of the Hessian by WSh and the combination of the warm start of the Hessian and the variables by WS. A benchmark of ten estimation for the methods Base, WS, WSx, WSh, WSbc and ST is conducted for the sequences S1 and S2. The Tables 1 and 2 presents a summary of the statistics for the methods previously mentioned. The lowest, highest, mean time and the standard deviation among the ten estimations are reported. The speedup corresponds to the ratio between the mean time of each method and the mean time of the Base method. Among the five types of warm start Base, WS, WSx, WSh, WSbc, the WS is the most efficient and reliable method. Indeed, it permits to speedup the estimation time by a fac- tor of 3.84 for S1 and 4.5 for S2. The observed standard deviation for WS is the lowest compared to the other methods. A standard deviation of 0.18 and 0.35 for respectively S1 and S2 are obtained. For the sequence S1 and S2,The WSh is also an efficient method because it permits to accelerate the estimation time by a factor respectively equal to 2.2 and 2.56. The WSx is not as performing as the WS and WSh,it reduces the estimation time by respectively 19 % and 15 % for S1 and S2. The WSbc does not reduces the estimation time of the Base method. It appears that the ST method of the variables is efficient. For the first sequence S1, the estimation time is reduced from 229.51 s to 203.28 s.Concerning the sequence S2, a reduction of 11 % of the estimation time is obtained. The standardization of the variables is not as effective as the WS method but permits to have interesting results and should be applied beforehand for every sequence of DCMs that presents variables with differences in the range of values. Table 1: Summary of statistics for 10 estimations by method for the sequence S1 Statistics Base WS WSx WSh WSbc ST Minimum 224.90 60.86 187.74 104.99 225.80 198.10 Maximum 231.32 61.45 189.49 106.07 231.85 203.28 Mean 229.51 61.07 188.73 105.44 229.80 201.72 Standard Deviation 1.78 0.18 0.5 0.29 1.65 1.35 Speedup 1.0 3.84 1.21 2.2 0.99 1.15 10
  • 11. Table 2: Summary of statistics for 10 estimations by method for the sequence S2 Statistics Base WS WSx WSh WSbc ST Minimum 534.58 119.18 459.26 210.35 529.25 478.36 Maximum 539.44 120.35 461.36 212.46 538.44 485.93 Mean 536.23 119.74 460.34 211.27 535.84 480.96 Standard Deviation 1.36 0.35 0.72 0.60 2.47 2.16 Speedup 1.0 4.5 1.17 2.56 1.01 1.12 A sensitivity analysis is conducted for The ES method. A sequence of 20 thresholds ranging from 10−7 to 5*10−4 is used in order to test the performance of the ES method compared to the Base method. The Figures 3 and 5 presents the relative estimation time of the ES method for the 20 thresholds compared to the Base method for respectively S1 and S2. The black line correspond to the mean time observed for the Base method for 10 estimations. The grey lines represents a confidence interval of 95 % around the mean estimation time of the Base method. A box plot with a confidence interval of 95 % for every threshold is plotted. For the sequence S1, a restrictive threshold of 10−7 leads to a speed up of 3 %, while for S2 a speed up for approximately 4% is observed. Is it possible to obtain a better speed up when increasing the value of the threshold. Indeed, a reduction of 35 % and 15 % of the optimization is obtained for respectively S1 and S2 when a less restrictive threshold of 0.0005 is used. Figure 3: Sensitivity analysis of the threshold parameter for S1 11
  • 12. Figure 4: Sensitivity analysis of the threshold parameter for S2 In order to select the best threshold, the improvement of the estimation time is not the only criterion that should be taken into account. Even though the number of models stopped earlier increases as long as the threshold increase and the total optimization time decreases,the main drawback is that the method could stop at a plateau that is far away from the real plateau of convergence of the log likelihood. These models are falsely stopped earlier and should be distinguished from the models that have reached the real plateau of convergence as explained in the Figure 1. The Figures 5 and 6 shows that from a certain threshold, some models are falsely stopped. Indeed, a threshold of 0.001 leads to 6 models among the 76 models that don’t reach the real convergence of the log likelihood for the sequence S1. Concerning the sequence S2, a less restrictive threshold of 0.003 stopped falsely 3 models among the 90 models stopped earlier. Even though the main objective is to speed up the optimization time of a sequence of models and higher thresholds leads to lower optimization time, the modeler has to be careful with models that are falsely stopped earlier. A threshold of 2*10−5 is acceptable in the sense that no model is falsely stopped earlier for both S1 and S2 and gives a speed up performance that is equivalent to less restrictive thresholds. 12
  • 13. Figure 5: Number of models falsely stopped earlier for S1 Figure 6: Number of models falsely stopped earlier for S2 13
  • 14. A benchmark that regroups the methods that speed up the estimation time of both S1 and S2 is launched. Among the five warm starts, the WS is the the most efficient. The ST must also be taken into account. The ES with a threshold of 2*10−5 method has shown an interesting reduction of time. The obtained results of the combination of all this methods for both S1 and S2 are compared to the Base method and presented in the Tables 3 and 4. The combination of the WS, ES with a threshold of 2*10−5 , ST methods leads to an improvement of a factor 5.26 and 6.67 compared to the Base method for respectively S1 and S2. Table 3: Summary of statistics for 10 estimations for the sequence S1 : Comparison between the regrouped performing methods and the Base method Statistics Base Final Minimum 224.90 44.25 Maximum 231.32 44.83 Mean 229.51 44.45 Standard Deviation 1.78 0.18 Speedup 1.0 5.26 Table 4: Summary of statistics for 10 estimations for the sequence S1 : Comparison between the regrouped performing methods and the Base method Statistics Base Final Minimum 534.58 83.77 Maximum 539.44 84.55 Mean 536.23 84.22 Standard Deviation 1.36 0.23 Speedup 1.0 6.67 14
  • 15. 6 Conclusion Enhancing the estimation of a sequence of DCMs is a subject that has not yet been explored. The objective of this project was to propose different methods to improve the total estima- tion time of a sequence of DCMs. Indeed, the BFGS−1 algorithm is used to estimate the two sequence of DCMs S1 and S2. The first approach was to implement a commonly used method in ML, which uses the knowledge acquired before to use it for a new task. The WS method permits to speedup the estimation time by a factor of 3.84 and 4.5 compared to the Base method for respectively S1 and S2. The standardization of the variables accelerate slowly the estimation time and should be used at the beginning of every optimization task because of its simplicity. The last approach is the ES method which has shown interesting improvement of the estimation time but the applied threshold has to be carefully chosen in order not stop at a bad convergence plateau of the log likelihood. A threshold of 2*10−5 is chosen. The combination of all the methods that speedup the estimation time of both S1 and S2 leads to an improvement of respectively 5.26 and 6.67 compared to the Base method. For the future, I would like to work on two improvements. The first one concerns the ES method. The ES method tends to stop at a plateau of convergences that could be far away from the real convergence of the log likelihood. A more robust ES method could be implemented by finding an efficient way to detect these regions of convergence. The second possible improvement concerns the warm start. A more detailed analysis of the warm start could be done. Even though, the total estimation time is reduced by the WS method, some models were this method is applied have an optimization time higher than the case where no warm start is applied. 15
  • 16. References [1] Bierlaire M. (1998) Discrete Choice Models. In: Labb´e M., Laporte G., Tanczos K., Toint P. (eds) Operations Research and Decision Aid Methodologies in Traffic and Trans- portation Management. NATO ASI Series (Series F: Computer and Systems Sciences), vol 166. Springer, Berlin, Heidelberg [2] Ben-Akiva M., Bierlaire M. (1999) Discrete Choice Methods and their Applications to Short Term Travel Decisions. In: Hall R.W. (eds) Handbook of Transportation Science. International Series in Operations Research & Management Science, vol 23. Springer, Boston, MA [3] Devolder, O., Glineur, F. & Nesterov, Y. First-order methods of smooth convex opti- mization with inexact oracle. Math. Program. 146, 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5 [4] Henning, P. and Kiefel, M. (2013). Quasi-newton methods: A new direction. The Journal of Machine Learning Research,14(1):843-865 [5] Robert M. Gower and Filip Hanzely and Peter Richt´arik and Sebastian Stich (2018). Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization. [6] Lederrey G., Lurkin V. and Hillel T. and Bierlaire M (2018). Estimation of Discrete Choice Models with Hybrid Stochastic Adaptive Batch Size Algorithms. [7] Jordan T. Ash and Ryan P. Adams (2019). On the Difficulty of Warm-Starting Neural Network Training. [8] Prechelt L. (2012). Early Stopping — But When? [9] Fletcher, R. (1987). Practical Methods of Optimization; (2Nd Ed.). Wiley-Interscience, New York, NY, USA. 16