Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR

Public
Depth estimation, processing and rendering
for dynamic six-degrees-of-freedom virtual
reality
Chris Varekamp
June 1st, 2018

Public
Outline
• Multi-view auto-stereoscopic display
• Real-time depth estimation
• Depth estimation for six-degrees-of-freedom (6DoF) AR/VR
• 6DoF processing flow
• Formats, packing, compression, playback
• Conclusions/future work
2

Public
Example results of real-time processing
3
• Multi-view on 4K auto-stereo display
• Rendering multiple views + weaving from left image/depth
[1] copyright  2006, Blender Foundation / Netherlands Media Art Institute / www.elephantsdream.org
[https://orange.blender.org/blog/creative-commons-license-2/]
• Depth estimation from Left -> Right
• Depth estimation from Right -> Left

Public
Real-time stereo to multi-view conversion
4
Left Eye Video
Right Eye Video
Estimate
Depth
View Synthesis
Dense Depth Video
Multiview Video
Implemented in real-time:
• Multi-core on amazon cloud (C++), 30 FPS @ 16 cores (2x depth estimation)
• FPGA/IC
• Algorithms largely suitable for implementation on GPU
• with > 10 user licenses worldwide

Public
Real-time depth estimation
5
Error classification
Recursive search
matching
Confidence/colour
adaptive filtering
Pixel-based
matching
Error classification
Confidence/colour
adaptive filtering
Depth coding
Depth
Left/right stereo
4x DDR3 memory
LVDS output (current 3D display)
SDI output
(for AR/VR)
FPGA: Altera Arria V device
DVI input
HDMI input
FPGA is joint work with Dimenco

Public
Ingredients for real-time and high quality
Disparity detection and correction
• Low-complexity disparity estimation using recursive search block matching [1]
• Disparity error detection and correction via supervised learning [2]
• Repeat both steps pixel wise.
Confidence and color adaptive filtering
• Efficient filtering [3]
• Confident pixels should not be filtered
• Unconfident pixels are filtered using colour similarity
6
[1] G. de Haan, P.W.A.C. Biezen, H. Huijgen, O.A. Ojo. True-motion estimation with 3-D recursive search block matching.
IEEE Transactions on Circuits and Systems for Video Technology, vol. 3, no. 5, October 1993.
[2] C. Varekamp, K. Hinnen, W. Simons. Detection and correction of disparity estimation errors via supervised learning.
International Conference on 3D Imaging, 3-5 Dec. 2013.
[3] L. Vosters, C. Varekamp, G. de Haan. Overview of efficient high-quality state-of-the-art depth enhancement methods by
thorough design space exploration. Journal of Real-Time Image Processing, pp. 1–21, 2015.

Public
Approaches to 6DoF
7
Accurate 3D geometry
Reflectance/scattering properties
Multi-modal sensing (vision, laser)
Light sources
Object recognition
Post-production
Dense camera/lens array (light field)Multiple views with depth
Camera spacing ~6 cm for indoor
Avoid costly post-production
Reduced hardware complexity

Public
Multi-camera design rules
8
𝐵
𝑧near
Disparity =
𝑓𝐵
𝑧near
[pixel]
𝑓
𝐵 = baseline m
𝑓 = focal length [pixel]
𝑧 = depth [m]
scene
𝑓 = 1000 pixel
(for 2K sensor, HFOV ≈ 90°)
Holds for regular lenses (perspective projection). For fisheye lenses the relation is different but the principle remains the same.
sensor

Public
6DoF processing flow
9
Multi-view
registration
Disparity estimation
for camera pairs
Multi-view
disparity refinement
Compositing
Image + Depth
compression
View synthesis
N-cameras
Image + Depth
decompression
Left/right stereo
Software/hardware real-time/offline
Real-time client
OpenVR/SteamVR
Graphics cards: GTX 1000 series

Public
Camera calibration/Multi-view registration
Offline
• Intrinsic parameters (focal, principle point, distortion) from known pattern
• Extrinsic parameters (rotation/translation) from known pattern
– Not robust to handling the camera setup
– Some frequently used algorithms cannot deal with more than two camera’s
Partially online
• Intrinsic parameters offline
• Extrinsic parameters online using images and estimated depth
– Multi-view registration method
– More practical and robust for rig handling;
– More relevant for larger (maybe outdoor) setups;
– We implemented two versions: (a) feature-based; (b) image-based on GPU
10
Original fisheye
Rectified

Public11
Compositing (Static Case)

Public
Formats, packing, coding, compression
12
Equirectangular
projection
𝐷 = 𝑎𝐷 + 𝑏Perspective projection
𝐷 ∝
𝑟min
𝑟
Fish-eye, Cube map, etc.
Standard video codecs can be used (e.g HEVC)
Optional packing of image and depth

Public
Playback via depth to mesh at client-side
13
Fixed mesh topology Depth map adaptive mesh topology
𝑢1, 𝑣1, 𝐷1 𝑢2, 𝑣2, 𝐷2
𝑃𝑉𝑀𝑄
𝑢
𝑣
𝐷 𝑢, 𝑣
1
Model, view, projection matrices
Re-projection matrix

Public
Example real-time configuration
14
USB3
USB3
Mini PC
HDMI
(1920x2160)
FPGA
SDI 4:2:2
(3840x2160)
Render PC
with SDI
capture card
1
2
VR headset
Synchronized
Capture
Image capture
Lens un-distortion
Stereo rectification
Output: left + right (top-bottom)
𝐿
𝑅
𝐿 𝑅
𝐷𝐿 𝐷 𝑅
Depth estimation
USB3
USB3
Mini PC
HDMI
(1920x2160)
FPGA
SDI 4:2:2
(3840x2160)
3
4
USB3
USB3
5
6

Public
Static 6DoF
15
HTC Vive headset
Anchor Views
Position tracking
Left eye Right eye

Public
Dynamic 6DoF: stereo with depth
16
HTC Vive headset
Anchor Views
Position tracking
Left eye
Sweet spot with motion freedom
position tracker (for static scene part)
fish-eye
𝐿 𝑅
𝐷𝐿 𝐷 𝑅
format
Compared with stereo:
• More natural experience allowing small head motions
• Depth packing or in separate streams (e.g. HEVC)
• Efficient rendering by combining two meshes and
blending the results for both eyes.

Public
Stereo without depth
17

Public
Dynamic 6DoF: linear array
19
HTC Vive headset
Anchor Views
Position tracking
Left eye
sweet spot
Compared with stereo:
• Larger motion freedom;
• For different applications: more camera’s, different configurations
• Scalable approach: decode only video streams in vicinity of the eye locations

Public
Multi-view (6 cameras) with depth
20

Public
Conclusions/future work
• Demonstrated use of our real-time depth estimation algorithms for 6DoF VR
• Depth can play a role in most components of a full system including playback
• Depth-based approach has the potential to achieve high-quality at low-latency
• A live streaming demo is possible (work in progress)
21
Contact: chris.varekamp@philips.com

Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR

Semelhante a Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR (20)

Mais de AugmentedWorldExpo

Mais de AugmentedWorldExpo (20)

Último

Último (20)

Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR

Notas do Editor