발표의 개요 : Human visual system 기반의 CNN for depth estimation과 CNN inspired by conventional methods
Case1: Cross-channel stereo matching
Case2: Depth from light field
Case3: Multiview stereo
Conclusion
Strategies for Landing an Oracle DBA Job as a Fresher
Depth estimation do we need to throw old things away
1. Depth estimation: Do we need to throw old things away?
Hae-Gon Jeon (전해곤)
1
Assistant Professor
2. My Research Timeline
2011 MS course 2013~2018: Ph.D. course 2015~Present
Coded exposure imaging:
Motion deblurring
[ICCV’13, ICCV’15, IJCV’17, TIP’17]
Light-field imaging
[ECCV’14, CVPR’15, ICCVW’15, PAMI’17,
SPL’17, TPAM’19, CVPR’18]
[CVPR’16, CVPR’18, TIP’19]
Depth + Denoising in low-light
2016~Present
[ICIP’15, ICCV’15, CVPR’16, ECCV16, SPL’17,
CVPR’17, TPAM’19]
Depth from small motion
2018~Present: Post-doc
Highly accurate
3D map
Optimized path
generation
From real map
information
Visual and AI system for rescue robotics
2
Visual and AI system for rescue robotics
[ICRA’19, Submitted to IROS19(1/2), IROS’19(1/2); IEEE TPAMI (Major Revision); IEEE TIP (Major Revision) ]
9. 9
Human Visual System and DispNet
N, Mayer et al., A Large Dataset to Train Convolutional Networks for
Disparity, Optical Flow, and Scene Flow Estimation, CVPR16
10. 10
Human Visual System
A multilayered membrane that contains millions of light-
sensitive cells that detect the image and translate it into a
series of electrical signals.
The optic nerves from both eyes join at the optic chiasma
where information from their retinas is correlated.
Humans constantly scan objects in their field of view,
usually resulting in a perceived image that is uniformly
sharp
11. 11
Upconvolutional layers
• high-level information passed from
coarser feature maps
• fine local information provided in
lower layer feature maps
Correlation layer
• Multiplicative patch
comparisons between two
feature maps
• No trainable weights
Convolution layer
• identical processing streams for the two images
• With this architecture the network is constrained to
first produce meaningful representations of the two
images separately
DispNet End-to-end disparity estimation network (No need optimization)
14. 18
Asymmetric stereo Light-field camera Monocular camera
[CVPR’16, Silver Prize of Samsung Humantech
Paper Award, Submitted to IEEE TIP]
[ECCV’14, CVPR’15, ICCVW’15, CVPR’18, IEEE
TPAMI’17, IEEE SPL’17, IEEE TPAMI’19,
Robustness champion of CVPR’17 workshop]
[ICCV’15, CVPR’16, ECCV’16, CVPR’18,
ICLR’19, IEEE TPAMI’17, IEEE SPL’17, IEEE
TPAMI’19, IEEE TPAMI under minor revision]
Today’s Talk
15. Stereo Matching with Color and Monochrome cameras
Publications
• Stereo Matching with Color and Monochrome Cameras in Low-light Conditions
Hae-Gon Jeon, Joon-Young Lee, Sunghoon Im, Hyowon Ha and In So Kweon
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016
• HumanTech Paper Award 2016, Silver Prize
• CMSNet: Deep Color and Monochrome Stereo
Hae-Gon Jeon, Sunghoon Im, Joon-Young Lee and Martial Hebert
Submitted to IEEE Transactions on Image Processing
19
16. Low-light imaging1: Burst photography
[Ziwei Liu et al., SIGGRAPH Asia 14,
Sam Hasinoff et al., SIGGRAPH Asia 16]
Courtesy of S. Im, H.-G. Jeon and I.S. Kweon,
[Submitted to CVPR 18]
Results from Google Camera HDR+
Sam Hasinoff et al., SIGGRAPH Asia 16
Burst shot with a short exposure
Suffering from under-exposure
20
17. Noisy Visible IR image Fused Output
Multi-spectral Video Fusion [IEEE TIP 07]
- Twin cameras: IR/Visible
- Temporal smoothing
- Cross-bilateral filter
Low-light imaging2: Multi-spectral fusion
21
18. 300 400 500 600 700 800 900 1000
0
10
20
30
40
50
60
70
Imaging Performance
Wavelength (nm)
QuantumEfficiency(%)
Blue
Green
Red
300 400 500 600 700 800 900 1000
0
10
20
30
40
50
60
70
Imaging Performance
Wavelength (nm)
QuantumEfficiency(%)
Blue
Green
Red
Gray
Color camera (RGB) Monochrome Camera (W)
Color information
Reducing sharpness
Vulnerable to noise
No color information
Sharp image
Robust to noise1) Different spectral sensitivity 2) Severe noise
Issues on Depth from RGB-W image pair
RGB2Gray
Monochrome
The proposed RGB-W stereo
22
20. Gain map Monochrome by
gain adjustment
Decolorized image
Disparity map
Monochrome
Overview of the proposed method`
(1) Input
pair
(2) Gain map
(3) Decolorization
(4) Disparity map by
iterative gain
adjustment
(5) Refined map by
a tree-based filtering
(6) High-quality
color image
Solution
24
21. 1) Contrast Preservation
2) Noise suppression
𝐸𝐸𝑛𝑛 𝛾𝛾 =
𝛻𝛻𝑢𝑢 𝐼𝐼𝛾𝛾
1 + 𝛻𝛻𝑣𝑣 𝐼𝐼𝛾𝛾
1
𝛻𝛻𝑢𝑢 𝐼𝐼𝛾𝛾
2 + 𝛻𝛻𝑣𝑣 𝐼𝐼𝛾𝛾
2
𝐸𝐸𝑐𝑐 𝛾𝛾 = 𝐺𝐺 𝐼𝐼, 𝐼𝐼 − 𝐺𝐺(𝐼𝐼, �𝐼𝐼𝛾𝛾) 1
𝐺𝐺: the guided output image
𝛻𝛻 𝑢𝑢,𝑣𝑣 : image gradient of horizontal 𝑢𝑢 and vertical 𝑣𝑣 directions
RGB2Gray Only contrast Proposed
𝐼𝐼𝛾𝛾
= 𝜔𝜔𝑟𝑟 𝐼𝐼𝑟𝑟 + 𝜔𝜔𝑔𝑔 𝐼𝐼𝑔𝑔 + 𝜔𝜔𝑏𝑏 𝐼𝐼𝑏𝑏
𝜔𝜔𝑟𝑟 + 𝜔𝜔𝑔𝑔 + 𝜔𝜔𝑏𝑏 = 1 𝜔𝜔𝑟𝑟 ≥ 0, 𝜔𝜔𝑔𝑔 ≥ 0, 𝜔𝜔𝑏𝑏 ≥ 0
𝜔𝜔 𝑟𝑟,𝑔𝑔,𝑏𝑏 ∈ {0.1, 0.2, ⋯ , 1.0}
High
Low
𝐼𝐼𝛾𝛾: the decolorized image
𝜔𝜔 𝑟𝑟,𝑔𝑔,𝑏𝑏 : weighting parameters of each color channel
𝐼𝐼 𝑟𝑟,𝑔𝑔,𝑏𝑏 : three color channels
Decolorization
Cost
Gain compensation
Impossible linear and global gain compensation due to different spectral sensitivities
Decolorization Gain compensationTractable solution :
Decolorization and Gain compensation
25
22. 𝒱𝒱 𝑥𝑥, 𝑙𝑙 = 𝛼𝛼𝒱𝒱𝑆𝑆𝑆𝑆𝑆𝑆 𝑥𝑥, 𝑙𝑙 + 1 − 𝛼𝛼 𝒱𝒱𝑆𝑆𝑆𝑆 𝑆𝑆(𝑥𝑥, 𝑙𝑙)
𝒱𝒱𝑆𝑆𝑆𝑆𝑆𝑆 𝑥𝑥, 𝑙𝑙 = �
𝑥𝑥∈Ω𝑥𝑥
min( 𝐼𝐼𝐿𝐿 − 𝐼𝐼𝑅𝑅
𝛾𝛾
𝑥𝑥 + 𝑑𝑑 , 𝜏𝜏1) 𝒱𝒱𝑆𝑆𝑆𝑆 𝑆𝑆 𝑥𝑥, 𝑙𝑙 = �
𝑥𝑥∈Ω𝑥𝑥
min( 𝐽𝐽(𝐼𝐼𝐿𝐿) − 𝐽𝐽(𝐼𝐼𝑅𝑅
𝛾𝛾
𝑥𝑥 + 𝑑𝑑 ) , 𝜏𝜏2)
𝑠𝑠. 𝑡𝑡. 𝐽𝐽 𝐼𝐼 =
| ∑𝑥𝑥∈Ω𝑥𝑥
𝛻𝛻𝛻𝛻(𝑥𝑥) |
∑𝑥𝑥∈Ω𝑥𝑥
𝛻𝛻𝛻𝛻 𝑥𝑥 + 0.5
0.5
1
Sum of absolute differences (SAD):
Robust to image noise
Sum of informative edges (SIE):
Robust to non-linear intensity variation
Color image Informative edge map 𝑱𝑱(𝑰𝑰)Conventional gradient map 𝛻𝛻𝐼𝐼
Sum of intensity
Intensity ×3
Ω𝑥𝑥: supporting window centered pixel at 𝑥𝑥, 𝑑𝑑: disparity, 𝜏𝜏 1,2 : truncated value
Brightness consistency Edge similarity
Sum of signed gradients
cancels out image noise
Sum of absolute gradients
computes how strong the edges
RGB-W Stereo Matching
26
24. 6
ANCC : Heo et al., Robust stereo matching using adaptive normalized cross-correlation, IEEE PAMI 2011
DASC : Kim et al., DASC: Dense adaptive self-correlation descriptor for multimodal and multi-spectral correspondence, CVPR 2015
JDMCC : Heo et al., Joint depth map and color consistency estimation for stereo images with different illuminations and cameras, IEEE PAMI 2013
CCNG : Holloway et al., Generalized assorted camera arrays: Robust cross-channel registration and applications, IEEE TIP 2015
36.64% 34.29% 11.55%21.45%22.44%
Dark illumination
ANCC
37.89%
JDMCC
36.24% 19.24%
ProposedCCNG
32.56%41.71%
DASC
Bright illuminationStructured light
Ground truth
Quantitative evaluation
28
26. 7
ProposedColorMono ANCC DASC JDMCC CCNG
Monochrom
e
Color ANCC (40.10%) DASC (39.85%)
Ground truth CCNG (31.80%)JDMCC (32.86%) Proposed (8.89%)
Dataset : Reindeer ( ) is a bad pixel
rate
Monochrome Color ANCC (26.58%) DASC (26.91%)
Ground truth CCNG (18.44%)JDMCC (18.54%) Proposed (15.14%)
Dataset : Moebious
Evaluations
30
27. 8
Colorization method
Color image Y channel of color image
SLIC super-pixelU & V channel mapping Colorization result
Colorization and enhancement
V channel of color image
High-quality color image recovery
31
30. 34
(1) Input
pair
(2) Gain map
(3) Decolorization
(4) Disparity map by
iterative gain
adjustment
(5) Refined map by
a tree-based filtering
(6) High-quality
color image
Problem
2. Matching window size
3. Balance value
1. Gain threshold
5. Smoothness
parameter
4. # of iterations
6. # of super-pixel
7. Color similarity
32. 36
CNN version of RGB-W Stereo
Image recovery
Depth estimation
Encoder
Consistency
33. 37
W C
Denoising
Left - Mono
Right - Color
Disparity
Denoised - Chrominance
Denoised - Mono
Initial colorization
Final color image
-
Occlusion
Occlusion
Disparity Colorization
CNN version of RGB-W Stereo
42. 46
Asymmetric stereo Light-field camera Monocular camera
[CVPR’16, Silver Prize of Samsung Humantech
Paper Award, Submitted to IEEE TIP]
[ECCV’14, CVPR’15, ICCVW’15, CVPR’18, IEEE
TPAMI’17, IEEE SPL’17, IEEE TPAMI’19,
Robustness champion of CVPR’17 workshop]
[ICCV’15, CVPR’16, ECCV’16, CVPR’18,
ICLR’19, IEEE TPAMI’17, IEEE SPL’17, IEEE
TPAMI’19, IEEE TPAMI under minor revision]
Today’s Talk
43. Depth from Single Light Field Images
Publications
• Accurate Depth Map Estimation from a Lenslet Light Field Camera
Hae-Gon Jeon, Jaesik Park, Gyeongmin Choe, Jinsun Park, Yunsu Bok, Yu-Wing Tai and In So Kweon
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015
• Depth from a Light Field Image with Learning- based Matching Costs
Hae-Gon Jeon, Jaesik Park, Gyeongmin Choe, Jinsun Park, Yunsu Bok, Yu-Wing Tai and In So Kweon
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Feb 2019
• Depth Estimation Challenge: Robustness Champion, CVPR workshop on Light Field for Computer Vision
• EPINET: A Fully-Convolutional Neural Network using Epipolar Geometry for Depth from Light Field Images
Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon and Seon Joo Kim
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2018
47
44. Epiploar plane image(EPI)
Synthetic EPI
Estimate slopes of lines
[Wanner and Goldluecke PAMI 13, Tao et al. ICCV 13, Tao et al. CVPR 15,
Wang et al. CVPR 16, Williem et al. CVPR 16, Heber et al. CVPR 17]
Light-field image
48
45. Commercial light-field camera
Main lens Microlens sensorObject
Blocking penetration of light
Capturing angular
information of rays in one
sensor
Sensor size 3280*3280
Sub-aperture image
328*328
Problem1. Reduce spatial resolution Problem2. Increase photon noise
49
46. Real world EPI
Epipolar plane image
EPIs from plenoptic camera are filled with many noises and aliasing, and
have vertical luminance changes due to the circular micro-lens used
50
47. Flipping adjacent views
Sub-aperture images
Very narrow baseline;
Physically 0.45mm
Within 1px
Averbuch and Keller, “A unified approach to FFT
based images registration”, IEEE TIP 2003
Accuracy with
1/100 pixel
precision!!
ℱ 𝐼𝐼 𝕩𝕩 + ∆𝕩𝕩 = ℱ{𝐼𝐼(𝕩𝕩)𝑒𝑒2𝜋𝜋𝑖𝑖∆𝕩𝕩
}
𝐼𝐼 𝕩𝕩 + ∆𝕩𝕩 = ℱ−1{ℱ{𝐼𝐼 𝕩𝕩 𝑒𝑒2𝜋𝜋𝑖𝑖∆𝕩𝕩}}
ℱ : Fourier transform
∆𝕩𝕩 : Sub-pixel displacement 𝕩𝕩
Bilinear Bicubic Phase Original
Multiview Stereo-Based Approach [CVPR’15]
51
48. Cost Volume
𝑓𝑓( )
Sub-aperture images
Matching Cost
,
Reference view Target view
Cost volume
Depthlabel
Sum of Absolute
Difference (SAD)
Sum of Gradient
Difference (GRAD)
52
53. Center View 3D mesh Actual scale Measured distance in 3D
Lytro Illum
Our Simple Lens Light Field Camera Dataset
Our simple lens camera w/o distortion correction With distortion correction
Qualitative Evaluation
57
54. There are still Problems
Problem2:
Severe noise
Problem1:
Severe vignetting 1. Hard to find accurate correspondence in
radiometric distortions and severe noise
⇒ Using various hand-craft matching cost
2. Which one is correct matching cost?
⇒ Predicting the correct matching cost using
two random forests
3. Does it work well in real world light-field images?
⇒ Realistic dataset generation based on an
imaging pipeline of the Lytro camera
58
55. Overview of the Proposed Method [TPAM’19]
1. Realistic Light Field Image Generation;
Emulating an imaging pipeline of Lytro camera
3. Random Forest 1 - Classification;
Selecting dominant matching costs
4. Random Forest 2 - Regression;
Predicting a disparity value with sub-pixel precision
2. Making Cost Volumes using Phase Shift;
Overcoming inherent degradation of light-field
images caused by a microlens array
SAD GRAD Census ZNCC
q = [ ]
59
56. 60
?
Raw image Sub-aperture images
x
y
tθ
x·sinθ+y·cosθ+t=0
θ
t
Template for indirect line fitting
Best
match
Proposed
( )ccc ZYX ,,
( )ZYX ,,
F
point
micro-lens
center
image
Ll
( )cc yx ,
( )yx,
projected
micro-lens
center
projected
point
main lens
Projection of
adjacent corners
Closest point to
micro-lens center
(u, v)
(u’, v’)
Line feature
( ) ( ) 0=+−⋅+−⋅ cvvbuua cc
Micro-lens center
(uc, vc)
−
′
′
′+
=
v
u
v
u
k
v
u
v
u
ˆ
ˆ
Adjacent
corners
[Y. Bok, H.-G. Jeon, and I. S. Kweon., Geometric Calibration of Micro-Lens-Based
Light-Field Cameras using Line Features, ECCV 2014, IEEE TPAMI 2017]
ProposedDansereau et al., ICCV13
Light-field Camera Geometric Calibration
Dansereau et al., ICCV13
57. Data Generation Vignetting Map
Noise-free multi-view images Vignetting map from averaged
white plane images
Sub-aperture image with
vignetting map
61
58. Data Generation Lenslet Image Generation
Sub-aperture image with
vignetting map
Extract a pixel from each sub-
aperture image
Aggregate these pixels in a lenslet
62
59. Data Generation Add Noise
Noise level estimation of each color channel
0.2 0.3 0.4 0.5 0.6
0
0.005
0.01
0.015
0.02
0.025
Intensity
StandardDeviation
Green Channel1
0.2 0.3 0.4 0.5 0.6
0
0.005
0.01
0.015
0.02
0.025
Intensity
StandardDeviation
Green Channel2
0.2 0.3 0.4 0.5 0.6
0
0.005
0.01
0.015
0.02
0.025
Intensity
StandardDeviation
Blue Channel
0.2 0.3 0.4 0.5 0.6
0
0.005
0.01
0.015
0.02
0.025
Intensity
StandardDeviation
Red Channel
Convert color image to raw image
Y. Schechner et al., “Multiplexing for
optimal lighting”, IEEE TPAMI 2007
63
60. Data Generation Realistic Sub-aperture Image Generation
Noisy raw imageDemosaicing
Rearrange pixels at each lenslet to
each sub-aperture image
64
61. Effectiveness of the augmented training dataset
Depth
profilewithout
Depth
profilewithDepth
profileGaussian noise
65
63. Cost Volumes Matching Costs
Sum of Absolute Difference (SAD)
Zero-mean Normalized Cross correlation (ZNCC)
Census Transform (Census)
Sum of Gradient Difference (GRAD)
+ Robust to image noise;
act as averaged filter
+ Compensate for differences in
both gain and offset
+ Synergy with other matching costs
+ imposing higher weights at edge boundaries
+ Tolerate radiometric
distortions
H. Hirschmuller and D. Scharstein, “Evaluation of stereo matching
costs on images with radiometric differences,” IEEE TPAMI 2009.
67
71. Random Forest1 - Classification
Importance
�q4
�q3
�q1 �q2
�q7
�q9
�q5
�q8 �q10
�q11
�q6
𝐪𝐪
Retrieving a set of
important matching costs
using the permutation
importance measure
[L. Breiman, “Random forests,” Machine learning]
+ Removing unnecessary
matching cost
+ Designing a better
prediction model
Matching Group1 Matching Group2 Matching Group3 Matching Group4
75
72. Random Forest2 - Regression
Random forest 2
Regression�𝐪𝐪 �q4�q3�q1 �q2 �q7 �q9�q5 �q8 �q10 �q11�q6
vs.
Estimated disparity value
with sub-pixel precision
SAD+GRAD
[H.-G. Jeon et al., IEEE CVPR 2015]
with Weighted Median Filter
[Z. Ma et al., IEEE ICCV 2013]
Input of a random forest
for regression
76
73. Real-world examples – Lytro Illum
Wanner and Goldluecke,
IEEE TPAMI 14
Yu et al,
ICCV 13
Ours,
CVPR 15
Williem et al,
CVPR 16
Wang et al,
IEEE TPAMI 16
Tao et al,
IEEE TPAMI 17 Proposed
Wanner and Goldluecke,
IEEE TPAMI 14
Yu et al,
ICCV 13
Williem et al,
CVPR 16
Wang et al,
IEEE TPAMI 16
Tao et al,
IEEE TPAMI 17 Proposed
Ours,
CVPR 15
77
75. Benchmark Bad pixel ratio (>0.07px) & Mean square error
Bad pixel ratio Mean square error
(2017.05.23)
Robustness Champion!!
79
76. Qualitative evaluation on different input setups
Kim et al., “Scene Reconstruction from High Spatio-
Angular Resolution Light Fields”, SIGGRAPH 2013
DSLR camera mounted on
motorized linear stage
Center view Kim et al. (# of input: 51) Proposed (# of input: 9)
80
77. Qualitative evaluation on different input setups
Samsung Galaxy Note 8
Input images SGM Proposed
H. Hirschmuller. Stereo processing by
semiglobal matching and mutual
information, IEEE PAMI 2008
81
86. Center View (b) (c) (d)
Ours(e) (f) (g)
Center View (b) (c) (d)
Ours(e) (f) (g)
B: Globally consistent depth
labeling of 4D lightfields. S.
Wanner and B. Goldluecke
C: Accurate depth map
estimation from a lenslet light
field camera.H.-G. Jeon et al ,
D: Robust light field depth
estimation for noisy scene with
occlusion. W. Williem et al
E: Occlusionaware depth
estimation using light-field
cameras. T.-C. Wang,.
F: Shape estimation from
shading, defocus, and
correspondence using light-field
angular coherence. Tao et al
G: Line assisted light field
triangulation and stereo
matching.Z. Yu, et al
90
88. 92
Asymmetric stereo Light-field camera Monocular camera
[CVPR’16, Silver Prize of Samsung Humantech
Paper Award, Submitted to IEEE TIP]
[ECCV’14, CVPR’15, ICCVW’15, CVPR’18, IEEE
TPAMI’17, IEEE SPL’17, IEEE TPAMI’19,
Robustness champion of CVPR’17 workshop]
[ICCV’15, CVPR’16, ECCV’16, CVPR’18,
ICLR’19, IEEE TPAMI’17, IEEE SPL’17, IEEE
TPAMI’19, IEEE TPAMI under minor revision]
Today’s Talk
89. Depth from Small Motion Video Clip
Publications (Co-author papers)
• High Quality Structure from Small Motion for Rolling Shutter Cameras
Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo and In So Kweon
IEEE International Conference on Computer Vision (ICCV), Dec 2015
• High-quality Depth from Uncalibrated Small Motion Clip [Oral presentation]
Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon and In So Kweon
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Jun 2016
• All-around Depth from Small Motion with A Spherical Panoramic Camera
Sunghoon Im, Hyowon Ha, François Rameau, Hae-Gon Jeon, Gyeongmin Choe and In So Kweon
European Conference on Computer Vision (ECCV), Oct 2016
• Robust Depth Estimation from Auto Bracketed Images
Sunghoon Im, Hae-Gon Jeon and In So Kweon
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2018
• Accurate 3D Reconstruction from Small Motion Clip for Rolling Shutter Cameras
Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo and In So Kweon
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Apr 2019
• DPSNet: End-to-end Deep Plane Sweep Stereo
Sunghoon Im, Hae-Gon Jeon, Steve Lin and In So Kweon
International Conference on Learning Representations (ICLR), May 2019
93
94. ∑∑= =
−=
I JN
i
N
j
jijijC
1 1
2
||)(||),( XKPxPX φ
Bundle adjustment
TT
JI
T
j
T
ij
Z
Y
Z
X
ZYX
NN
ZYX
vu
]1,,[)],,([
featureof#:image,of#:
coordinateWorld:]1,,,[
coordinateimage2D:]1,,[
=
=
=
φ
X
x
matrixExtrinsic:matrix,Intrinsic: ijPK
=
100
0 yy
xx
cf
cf α
K
Sparse 3D reconstruction [ICCV’15]
2*Thenumberofre-projectionpoints
The number of refine value
Jacobian matrix with rolling shutter
The number of features : 8
The number of images : 6
2*Thenumberofre-projectionpoints
The number of refine value
Small angle approximation - Rotation matrix
−
−
−
==
1
1
1
x
ij
y
ij
x
ij
z
ij
y
ij
z
ij
ijijijij
rr
rr
rr
RR )(],|)([ rtrP where
)( iiiij w rrrr −+= +1
)( iiiij w tttt −+= +1
Rotation and translation components
Jacobian matrix without rolling shutter
98
96. Input Image Only color smoothness
• Results of dense 3D reconstruction
Input Image Only color
smoothness
Dense 3D Reconstruction
100
97. Geometry guidance term
∑ ∑∈ ⋅
⋅
−=
p Wq
q
pp
qp
p
g
pg
p
DDwE 2
)
ˆ
ˆ
()(
Xn
Xn
D
Key Idea
Neighboring pixels with similar color
should have similar normal
Normal vectors guide the 3D
position of neighboring pixels
∑∈
⋅−−
=
pWq g
qp
g
g
p
N
w
γ
)(
exp
nn11
8
1
=
=
gg
p
p
pp
p
ppp
N
W
n
DXD
pD
yxX
constant,
neighbors-8
vectornormal:
coordinate3
ateDepth valu:
coordinateimageNormalized
:
:
:ˆ
:],,[ˆ
γ
)()()()( DDDD ggccd EEEE λλ ++=
Energy function
)XDX(Dn ppqqp
ˆˆ −⋅
Dense 3D Reconstruction
Input image Sparse 3D points
Normal of 3D points Normal map
101
98. Input Image
Sparse 3D points Proposed method
Conventional method
• Results of dense 3D reconstruction
Dense 3D Reconstruction
102
102. There are still Problems
1. Exact definition of small motion 2. Blurry depth
106
103. Reference image Depth difference
Ground truth Our depth map (𝒃𝒃 = 𝟏𝟏. 𝟓𝟓)
Camera motion: Closest object
= 1:100
Small Motion Issue [CVPR’16]
107
104. Solution to Blurry Depth: Plane Sweeping
FarNear
P1
P2
P3
Sweeping depth
P1
P2
P3
Reference image
Mean
image
Intensity
profile
Mean
image
Intensity
profile
Mean
image
Intensity
profile
Reference view
Other views
Flat
Flat
Flat
Sharp
Sharp
Sharp
Given intrinsic and extrinsic parameters of camera,
108
105. Depth Quantization Error [TPAM’19]
Caused by Quantized depth
range in Plane sweeping
Depth range
109
106. Adaptive Matching Window [TPAM’19]
Adaptive depth range
The usage of blurry depth to
estimate min-max depth range
for per-pixel
controls the confidence weight [0,1]
controls the steepness of the exponential function
confidence map
Initial depth
110
110. Procedure of stereo matching
Parametric – AD, SAD, BT, mean filter,
Laplacian of Gaussian,
Bilateral filtering, ZSAD,
NCC, ZNCC
Nonparametric – Rank filter, Softrank filter,
Census filter, Ordinal
Mutual Information – Hierarchical MI
Input
Guide
Output
Cost volume
Cost aggregation Iterative refinementGraph-cuts
𝐸𝐸 = 𝐸𝐸𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝐸𝐸𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
+𝐸𝐸𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 + 𝐸𝐸𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑟𝑟
𝑙𝑙𝑟𝑟 −
𝐶𝐶 𝑙𝑙+ + 𝐶𝐶(𝑙𝑙−)
2(𝐶𝐶(𝑙𝑙+ + 𝐶𝐶 𝑙𝑙− − 2𝐶𝐶(𝑙𝑙𝑟𝑟)))
Image Cost volume Cost aggregation Graph-cuts Iterative refinement
Cost computation
Common Pipeline of Traditional Approaches
Accurate Depth Map Estimation from a Lenslet Light Field Camera
Hae-Gon Jeon et al., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015
Stereo Matching with Color and Monochrome Cameras in Low-light Conditions
Hae-Gon Jeon et al., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016
Fully End-to-end process
114
111. Design a network inferred by traditional Plane Sweeping Algorithm
𝒊𝒊𝒕𝒕𝒕𝒕 Pair Image
W × H × 2CH
Feature Concatenate
: Volume Generation
* No learnable para.
1
𝒍𝒍
𝑳𝑳
W × H × 2CH × L
Cost Volume
Generation
* 3D CNN
W × H × L Averaging
Cost
(𝒊𝒊 ∈ {𝟏𝟏, . . 𝑵𝑵})
* No learnable para.
W × H × L
Cost Aggregation
+ Upsampling
* 2D CNN
𝒊𝒊𝒕𝒕𝒕𝒕
Cost
Volume
(𝒊𝒊 + 𝟏𝟏)𝒕𝒕𝒕𝒕
Cost
Volume
Reference Image
⋯
4W×4H×1
Depth regression
(SoftMax)
* No learnable para.
W × H × CH
Feature Extraction
(1/4 Downsampled)
* 2D CNN
4W × 4H × 3
Input Image
Warping through
𝒍𝒍𝒕𝒕𝒕𝒕 plane (sweep)
Overview of DPSNet [ICLR19]
115
112. Testing
Do same process using reference and i-th images (i=1, … , N)
Add all of the cost volume (N), then average them
𝒊𝒊𝒕𝒕𝒕𝒕 Pair Image
W × H × 2CH
Feature Concatenate
: Volume Generation
* No learnable para.
1
𝒍𝒍
𝑳𝑳
W × H × 2CH × L
Cost Volume
Generation
* 3D CNN
W × H × L Averaging
Cost
(𝒊𝒊 ∈ {𝟏𝟏, . . 𝑵𝑵})
* No learnable para.
W × H × L
Cost Aggregation
+ Upsampling
* 2D CNN
𝒊𝒊𝒕𝒕𝒕𝒕
Cost
Volume
(𝒊𝒊 + 𝟏𝟏)𝒕𝒕𝒕𝒕
Cost
Volume
Reference Image
⋯
4W×4H×1
Depth regression
(SoftMax)
* No learnable para.
W × H × CH
Feature Extraction
(1/4 Downsampled)
* 2D CNN
4W × 4H × 3
Input Image
Warping through
𝒍𝒍𝒕𝒕𝒕𝒕
plane (sweep)
Iteratively add
cost volume
Training & Test Process
116
113. Filter each cost layers
using reference image features
Inspired by traditional cost volume filtering
We use shared weight for all layers
Aggre-
gated
Volume
Initial
+
Residual
Context Network
(2D Convolution)
Reference
Image Feature
Cost Volume Slice
Deep Cost Aggregation
Rhemann et al., CVPR2011
117
114. Reference GT depth
Estimated
Depth
Slice of volume along a label
(far/close)
Slice of volume along the green row in
reference image (x: Column, y: Cost layer)
Before
Aggregation
After
Aggregation
Ablation Study: Cost Aggregation
118
115. Confidence measures
Depth map evaluation
Lower is better Higher is better
* Winner Margin (WM):
Difference of maximum
and second maximum
* Curvature (CUR): Difference near
maximum response
Ablation Study: Cost Aggregation
119
116. ⇐ Error metrics
w.r.t the number of images
⇐ Depth map result
w.r.t the number of images
Reference GT depth 2-view 3-view 4-view
Ablation Study: Number of Input Images
120
119. Summary
Step1: Solution
Propose new ideas;
Iterative decolorization
Phase shift
Rolling shutter bundle
-> heavy computational burden
Cascade optimization;
Cost aggregation, Graph-cuts,
weighted median filtering, tree-
based filtering
-> Need for careful tuning user-
parameters`
123
Step2: maturity
Handling remaining issues;
Photometric distortion
-- Random forest prediction
Depth quantization error
-- adaptive matching window
-> Still suffering from
computational issues
Step3: Breakthrough
Design of CNNs via Re-Search;
CMSNet: fraction of iterative stereo
matching
EPINet: Merging traditional approaches
DPSNet: inspired by traditional plane
sweeping algorithm
-> No user parameters in test phase
-> Fast depth prediction
-> Accurate results
-> Large number of training parameters
-> New applications
Personal website https://sites.google.com/site/hgjeoncv/home