PEMF2_SDM_2012_Ali

Quantifying Regional Error in Surrogates by Modeling its
Relationship with Sample Density
Ali Mehmani, Souma Chowdhury , Jie Zhang, Weiyang Tong,
and Achille Messac
Syracuse University, Department of Mechanical and Aerospace Engineering
54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and
Materials Conference
April 8-11, 2013, Boston, Massachusetts

Surrogate model
• Surrogate models are commonly used for providing a tractable and
inexpensive approximation of the actual system behavior in many
routine engineering analysis and design activities:
2

3
Broad Research Question
Lower Fidelity How Low?
Structural Blade Design
ANSYS, Inc.
Surrogate Model
Expensive Inexpensive

4
Broad Research Question
How to quantify the level of the surrogate accuracy ?
 further improvement of the surrogate,
 domain exploration,
 assessing the reliability of the optimal design,
 quantifying the uncertainty associated with the surrogate,
 construction of a weighted surrogate model, and
 …

Research Objective
 Develop a reliable method to quantify the surrogate error,
5
 This method should have the following characteristics:
 model independent
 no additional system evaluations
 local/global error measurement
 quantify the error of the actual surrogate

Regional Error Estimation of Surrogate
6

Regional Error Estimation of Surrogate
7
(REES)

Presentation Outline
• Review of surrogate model error measurement methods
8
• Relation of surrogate accuracy with sample density
• Regional Error Estimation of Surrogate
• Numerical examples: benchmark and an engineering
design problems

9
design problems

Surrogate Model Error Measurement Methods
 Error quantification methods can be classified,
 based on their computational expense, into methods that require
additional data, and methods that use existing data.
 based on the region of interest, into:
- Global error measure
(e.g., split sample, cross-validation, Akaike’s information criterion, and
bootstrapping).
- Local or point-wise error measure
(e.g., the mean squared errors for Kriging and the linear reference model
(LRM).)
10

 Error metrics,
11
• The mean squared error (MSE) (or root mean square error (RMSE) )
• The maximum absolute error (MAE)
• The relative absolute error (RAE)
actual values on ith test point
predicted values on ith test point

 Error metrics,
• The prediction sum of square (PRESS) is based on the leave-one-out
12
cross-validation error
• The root mean square of PRESS (PRESSRMS) based on the k-fold cross-validation.
• The relative absolute error of cross-validation (RAECV) based on leave-one-
out approach.

13
design problems

Methodology: Concept
14
Model accuracy ∝ Available resources
In general, this concept can be applied for different methodologies
- Surrogate modeling,
- Finite Element Analysis, and
- ...

15
 Finite Element Analysis (numerical methods)
The finer mesh, the stresses are more precise due to the larger
number of elements
coarse mesh
(4 solid brick element)
medium mesh
fine mesh
Estimate total shear force and flexural moment at vertical
sections using Finite Element Analysis.

 Surrogate (mathematical model)
Surrogate accuracy generally improves with increasing training points.
3 training points
9 training points
7 training points
The location of additional points has
strong impact on surrogate accuracy.
This impact is highly problem and model
dependent.
16

17
design problems

Methodology: REES
 The REES method formulates the variation of error as a
18
function of training points using intermediate surrogates.
 This formulation is used to predict the level of error in a
final surrogate.

Methodology: REES
Step 1 : Generation of sample data
The entire set of sample points is represented by 푿 .
Step 2 : Identification of sample points inside/outside region of interest
{푿풊풏}
{푿풐풖풕}
푿 = 푿풊풏 + 푿풐풖풕
푿풊풏 : Inside-region data set
푿풐풖풕 : Outside-region data set
user-defined region of interest
19

Methodology: REES
Step 3 : Estimation of the variation of the error with sample density
Outside-Training region Point
point
Inside-region point
user-defined region
of interest
First Iteration :
Test Point
Second iteration :
Third iteration :
Final Surrogate :
Training Point

Methodology: REES
 A position of sample points which are selected as training
points, at each iteration, is critical to the surrogate accuracy.
 The proposed error measure should be minimally sensitive to
the location of the test points at each iteration.
21
 Intermediate surrogates are
iteratively constructed (at each
iteration) over a sample set
comprising all samples outside the
region of interest and heuristic
subsets of samples inside the region
of interest.

Methodology: REES
 The intermediate subset for each
 The number of iterations (푁푖푡) is defined
combination at specific iteration is defined
by
{휷풌} ⊂ 푿풊풏
#{휷풌} = 풏풕, 풏풕−ퟏ < 풏풕
풌 = 1,2, … , 퐾푡
- dimension of a problem,
- number of inside sample points, and
- preference of the user
 The intermediate training points and test
points for each combination at each
iteration is defined by
 The number of sample combinations
(푲풕 ) is defined,
푿푻푹 = 푿풐풖풕 + 휷풌
푿푻푬 = 푿 − 푿푻푹
 The intermediate surrogates
푓푘 , 풌 = ퟏ, ퟐ, . . , 푲풕
are constructed for all combinations using the
intermediate training points ( 푿푻푹 ), and are
tested over the intermediate test points ( 푿푻푬 ).

Methodology: REES
 The median and the maximum errors are
estimated for each combination
풎풕: the number of test points in tth iteration
풆: the RAE value estimated on intermediate test points
23

Methodology: REES
 The median and the maximum errors are
estimated for each combination
The median is a useful measures of central
tendency which is less vulnerable to outliers.
24
Median error
Overall Fidelity Information
Maximum error
Minimum Fidelity Information

Methodology: REES
 Probabilistic models are developed using
The mode of distribution is selected to
represent the errors at each iteration.
a lognormal distribution to represent
median and maximum errors estimated
over all 푲풕 combinations at each
iterations.
Mode of median error distribution
Mode of maximum error distribution
 These values are used to relate the
variation of the surrogate error with
number of training points (sample
density).

The relation of the error with sample density
 12-D Test Problem (Dixon & Price, n=12)
Number of sample points # 푿 = ퟓퟓퟎ, Number of inside sample points # 푿풊풏 = # 푿
Number of training points at each iteration,풏풕 = 5푡 + 50, 푡 = 1,2, … , 70
Number of sample combination, 푲풕 = 500
Number of Training Points
MOmax
# 푿푻푹 = ퟓퟓ
# 푿푻푬 = ퟒퟒퟓ
Estimated mode of median errors Estimated mode of maximum errors
MOmed
First iteration
Last iteration
# 푿푻푹 = ퟒퟎퟎ
# 푿푻푬 = ퟏퟎퟎ

The relation of the error with sample density
 12-D Test Problem (Dixon & Price, n=12)
Meanmean
Estimated mode of median errors Estimated mean of mean errors
MOmed
REES Method Normalized k-fold CV

Methodology: REES
Step 4 : Prediction of regional error in the final surrogate
 The final surrogate model is constructed using the full set of training data.
 Regression models are applied to relate
- the statistical mode of the median error distribution(푴풐풎풆풅)
- the statistical mode of the maximum error distributions(푴풐풎풂풙), and
- the absolute maximum error (푨푩푺풎풂풙)
at each iteration to the size of the inside-region training points (nt),
 These regression models are called the variation of error with sample density
(VESD).
The regression models are used to predict the level of the
error in the final surrogate within the region of interest.
28

Methodology: REES
Modeling the Variation of Regional Error with Training Point Density
 In this study, three types of the regression functions are used to represent
the variation of regional error with respect to the inside-region training points
Exponential regression model
Multiplicative regression model
Linear regression model
 The choice of these functions assume a smooth monotonic decrease of the
regional error with the training point density within that region.
 The root mean squared error metric is used to select the best-fit regression
model
29

30
design problems

Numerical Examples
 The effectiveness of the REES method is explored for applications with
- Kriging,
- Radial Basis Functions (RBF),
- Extended Radial Basis Functions (E-RBF), and
- Quadratic Response Surface (QRS).
 To evaluate practical and numerical efficiencies of the REES method,
three benchmark problems and an engineering design problem are tested.
 The error evaluated using REES, and the relative absolute error given by
leave-one-out cross-validation (푹푨푬풄풗) are compared with the actual
error evaluated using relative absolute error on additional test
points (푹푨푬풂풄풕풖풂풍).
31

Median of RAEs
Numerical Examples
Results and Discussion
VESD regression models within the region of interest of surrogate models
constructed for the Branin-Hoo Function to predict,
Distribution of
median errors
Mode of the median error
distribution,
Predicted mode of median error
in the final surrogate,
VESDmed
Number of Inside-region Training Points
32

Numerical Examples
VESD regression models
within the region of interest of
surrogate models constructed
for the Branin-Hoo Function
to predict,
Type and coefficients of
VESDmed
Kriging RBF
E-RBF QRS

Maximum of RAEs
Numerical Examples
VESD regression models within the region of interest of surrogate models constructed for the
Branin-Hoo Function to predict the mode of maximum ( ) and the absolute
maximum ( ) error.
Distribution of
maximum errors
Mode of the maximum
error distribution,
Absolute maximum error
Predicted absolute
maximum error in
the final surrogate
Predicted mode of
maximum error in
the final surrogate
34 Number of Inside-region Training Points

Numerical Examples
VESD regression models within the region of interest of surrogate models constructed for the
Branin-Hoo Function to predict the mode of maximum ( ) and the absolute
maximum ( ) error.
Type and coefficients of VESDmax
Type and coefficients of VESDABS
Kriging RBF
E-RBF QRS

Numerical Examples
Wind Farm Power Generation
Surrogates are developed using Kriging, RBF, E-RBF, and QRS to
represent the power generation of an array-like wind farm.
36

Numerical Examples
VESD regression models in different surrogates for the wind farm power
generation problem
It. 1 It. 2 It. 3 It. 4 Predicted Error
37

Numerical Examples
38
The closer to one, the better the corresponding error measure.
predicted mode of median errors
median of RAEs evaluated on test
points
median of relative absolute
errors of cross-validation

Concluding Remarks
 We developed a new method to quantify surrogate error based on the
hypothesis that:
“The accuracy of the approximation model is related to the amount
of available resources”
 This relationship can be reliably quantified when the error measures is
less sensitive to sample locations or a type of application.
 The REES method addresses this issue.
 The preliminary results on benchmark and wind farm power generation
problems indicate that in majority of cases the REES method is more
accurate than other measures.
39
It is not possible using any existing methods

Future Works
 The scope for improvement the method
 The implementation of the proposed error measurement in
surrogate developments.
40

Acknowledgement
41
 I would like to acknowledge my research adviser
Prof. Achille Messac, and my co-adviser Prof.
Souma Chowdhury for their immense help and
support in this research.
 Support from the NSF Awards is also acknowledged.

42
Thank you
Questions
and
Comments

Median of RAEs
Numerical Examples
VESD regression models within the region of interest of surrogate models
constructed for the Branin-Hoo Function to predict,
Distribution of
median errors
k-fold CV
Mode of the median error
distribution,
Predicted mode of median error
in the final surrogate,
VESDmed
43
Mean mean

PEMF2_SDM_2012_Ali

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a PEMF2_SDM_2012_Ali

Semelhante a PEMF2_SDM_2012_Ali (20)

Mais de MDO_Lab

Mais de MDO_Lab (20)

Último

Último (20)

PEMF2_SDM_2012_Ali