A quantitative evaluation methodology for disparity maps includes the selection of an error measure. Among existing measures, the percentage of bad matched pixels is commonly used. Nevertheless, it requires an error threshold. Thus, a score of zero bad matched pixels does not necessarily imply that a disparity map is free of errors. On the other hand, we have not found publications on the evaluation process where different error measures are applied. In this paper, error measures are characterised in order to provide the bases to select a measure during the evaluation process. An analysis of the impact on results of selecting different error measures on the evaluation of disparity maps is conducted based on the presented characterisation. The evaluation results showed that there is a lack of consistency on the results achieved by considering different error measures. It has an impact on interpreting the accuracy of stereo correspondence algorithms.
ICT role in 21st century education and its challenges
On the Impact of the Error Measure Selection in Evaluating Disparity Maps
1. On the Impact of the Error
Measure Selection in Evaluating
Disparity Maps
Ivan Cabezas, Victor Padilla, Maria Trujillo and
Margaret Florian
ivan.cabezas@correounivalle.edu.co
June 27th 2012
World Automation Congress, ISIAC, Puerto Vallarta - Mexico
2. Multimedia and Vision Laboratory
MMV is a research group of the Universidad del Valle in Cali, Colombia
3D World
Optics
Ivan Victor Maria Margaret Problem
Camera Inverse
System Problem
2D Images
Multimedia and Vision Laboratory Research: http://mmv-lab.univalle.edu.co
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 2
3. Content
Stereo Vision
Application Domains
The Impact of Inaccurate Disparity Estimation
Quantitative Evaluation
Commonly Used Evaluation Measures
Error Measure Function
Error Measures Purpose and Meaning
Research Problem
Comparative Performance Scenario
Middlebury's Evaluation Model
A* Evaluation Model
Research Questions
Algorithm to Measure the Consistency
Consistency According to Evaluation Models
Conclusions
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 3
4. Stereo Vision
The stereo vision problem is to recover the 3D structure of a scene
Correspondence
Stereo Images Algorithm
3D Model
Left Right
P
Disparity Map Reconstruction
Algorithm
Points Disparity Values
P L
Z d: P L 0
pl pr p1
p2 1
p3 2
πl πr .
. 3
.
f .
pn
.
.
dmax
Cl B Cr
Yang Q. et al., Stereo Matching with Colour-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling, IEEE PAMI 2009
Scharstein D., and Szeliski R., High-accuracy Stereo Depth Maps using Structured Light, CVPR 2003
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 4
5. Applications Domains
3D recovering has multiple application domains
Whitehorn M., Vincent T., Debrunner C. and Steele J., Stereo Vision on LHD Automation References, IEEE, Trans on Industry Apps., 2003
Van der Mark W., and Gavrila D., Real-Time Dense Stereo for Intelligent Vehicles, IEEE Trans. On Intelligent Transportation Systems, 2006
Point Grey Research Inc., www.ptgrey.com
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 5
6. The Impact of Inaccurate Disparity Estimation
Disparity is the distance between corresponding points
Accurate Disparity Estimation Inaccurate Disparity Estimation
P P
P’
Z Z’ Z
pl pr pl pr
πl πr πl πr
pr’
f f
Cl B Cr Cl B Cr
Trucco, E. and Verri A., Introductory Techniques for 3D Computer Vision, Prentice Hall 1998
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 6
7. Quantitative Evaluation
The use of a methodology allows to:
Assert specific components and procedures
Tune algorithm's parameters
Measure the progress in the field
Szeliski, R., Prediction Error as a Quality Metric for Motion and Stereo, ICCV 2000
Kostliva, J., Cech, J., and Sara, R., Feasibility Boundary in Dense and Semi-Dense Stereo Matching, CVPR 2007
Tomabari, F., Mattoccia, S., and Di Stefano, L., Stereo for robots: Quantitative Evaluation of Efficient and Low-memory Dense Stereo Algorithms, ICCARV 2010
Cabezas, I. and Trujillo M., A Non-Linear Quantitative Evaluation Approach for Disparity Estimation, VISAPP 2011
Cabezas, I. Trujillo M., and Florian M., An Evaluation Methodology for Stereo Correspondence Algorithms, VISAPP 2012
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 7
8. Commonly Used Evaluation Measures
There are different evaluation measures
Sigma Z Error, SZE
Cabezas, I., Padilla, V., and Trujillo M., A Measure for Accuracy Disparity Maps Evaluation, CIARP 2011
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 8
9. Error Measure Function
Error Criteria
Test Bed nonocc
Evaluation Measures
Measure nonocc all disc
MAE 0,41 1,48 0,70
MSE 1,48 33,97 4,25
all MRE 0,01 0,03 0,02
BMP 2,90 8,78 7,79
SZE 71,39 341,55 37,86
Estimated Ground-truth
disc
Yang Q. et al., Stereo Matching with Colour-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling, IEEE PAMI 2008
Scharstein D., and Szeliski R., High-accuracy Stereo Depth Maps using Structured Light, CVPR 2003
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 9
10. Error Measures Purpose and Meaning
In practice, different error measures are used for a same purpose: find a
distance between estimated and ground-truth disparity data
They have different meaning, as well as different properties
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 10
11. Research Problem
The use of different error measures may produce contradictories score errors
Algorithms
ADCensus RDP
Teddy
Cones
Scharstein, D. and Szeliski, R., A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 2002
Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 11
12. Comparative Performance Scenario
Four stereo image pairs: Tsukuba, Venus, Teddy, Cones
Three error criteria: nonocc, all, disc
112 Stereo Correspondence Algorithms
Two evaluation models: Middlebury and A*
k: a threshold for determining the top-performer
algorithms in the Middlebury's evaluation model
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 12
13. Middlebury’s Methodology Evaluation Model
… Compute Error Measures Apply Evaluation Model
Algorithm nonocc all disc Algorithm nonocc all disc
ObjectStereo 2.20 6.99 6.36 ObjectStereo 2.20 1 6.99 2 6.36 1
GC+SegmBorder 4.99 5.78 8.66 GC+SegmBorder 4.99 5 5.78 1 8.66 5
PUTv3 2.40 9.11 6.56 PUTv3 2.40 2 9.11 5 6.56 2
PatchMatch 2.47 7.80 7.11 PatchMatch 2.47 3 7.80 3 7.11 3
ImproveSubPix 2.96 8.22 8.55 ImproveSubPix 2.96 4 8.22 4 8.55 4
Middlebury’s
Evaluation Model
Algorithm Average Final
Rank Rank
ObjectStereo 1.33 1
PatchMatch 3.00 2
PUTv3 3.33 3
GC+SegmBorder 3,66 4
ImproveSubPix 4.00 5
Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 13
14. A* Evaluation Model
The A* evaluation model performs a partitioning of the stereo algorithms under
evaluation, based on the Pareto Dominance relation
… Compute Error Measures Apply Evaluation Model
Algorithm nonocc all disc
ObjectStereo 2.20 6.99 6.36
GC+SegmBorder 4.99 5.78 8.66 A* Evaluation Model
PUTv3 2.40 9.11 6.56
PatchMatch 2.47 7.80 7.11
ImproveSubPix 2.96 8.22 8.55
ObjectStereo , GC+SegmBorder
PatchMatch , PUTv3 , ImproveSubPix
Algorithm nonocc all disc Set
ObjectStereo 2.20 6.99 6.36 A*
GC+SegmBorder 4.99 5.78 8.66 A*
PUTv3 2.40 9.11 6.56 A’
PatchMatch 2.47 7.80 7.11 A’
ImproveSubPix 2.96 8.22 8.55 A’
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 14
15. Research Questions
What is the impact of using an error measure instead of other?
Different evaluation results are obtained using different evaluation measures
Middlebury's Model A* Model
Scharstein, D. and Szeliski, R., A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 2002
Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 15
16. Research Questions (ii)
How does an error measure have to be choose ?
A characterisation of error measures may serve as selection criteria
An error measure:
Automati
AUTOMATIC c is computed without human intervention
Reliable I
RELIABLE has to operate without being influenced by
external factors, and in a deterministic way
Meaningful
MEANINGFUL is intended for a particular purpose, has a
concise interpretation and does not lead to
ambiguous results
Unbiased
UNBIASED is capable of accomplish the measurements for
which is was conceived, and its use allow to
perform impartial comparisons
Consistent
CONSISTENT The scores produced by an error measure should
be compatible with produced scores by another error
measure with a common particular purpose
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 16
17. Algorithm to Measure the Consistency
Consistency is measured by determining the percentages of agreements
in obtained results by varying the used error measure
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 17
18. Consistency According to Evaluation Models
The MRE, followed by the MSE error measures shown the highest
percentages of consistency using the Middlebury's model
The SZE, followed by the MRE error measures shown the highest
percentages of consistency using the A* model
Middlebury's Model A* Model
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 18
19. Conclusions
Using the Middlebury’s evaluation model the MRE and the MSE shown a
high consistency
Using the A* evaluation model the SZE and the MRE shown a high
consistency
The BMP shown a low consistency in both used evaluation models
A characterisation of error measure was presented in order to support the
selection of an error measure
It includes the following attributes: automatic, reliable, meaningful,
unbiased, and consistent
Experimental evaluation was focused on measuring consistency
The selection of an error measure is not a trivial issue since it impacts on
obtained results during a disparity maps evaluation process
On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico Slide 19
20. On the Impact of the Error
Measure Selection in Evaluating
Disparity Maps
Ivan Cabezas, Victor Padilla, Maria Trujillo and
Margaret Florian
ivan.cabezas@correounivalle.edu.co
June 27th 2012
World Automation Congress, ISIAC, Puerto Vallarta - Mexico