SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
DynamicFusion:	Reconstruction	and	Tracking	
of	Non-rigid	Scenes	in	Real-Time
Richard	A.	Newcombe,	Dieter	Fox,	Steven	M.	Seitz
CVPR2015,	Best	Paper	Award
論文紹介,櫻田 健 (東京工業大学),	2015年6月23日
1
論文紹介者
櫻田 健 ( http://www.vision.is.tohoku.ac.jp/us/member/sakurada )
• 東京工業大学 博士研究員(2015年4月〜)
• 東北大学岡谷研卒(2015年3月)
• Twitter	ID:	@sakuDken
• 研究内容
– 車載カメラを利用した都市の時空間モデリング
– SfM,	MVS,	CNN	…
• 主要論文
– CVPR(Poster) 1本
– BMVC(Poster)	1本
– ACCV(Oral)							1本
• “Best	Application	Paper	Honorable	Mention	Award”
– IROS	…
内容に関して何かお気づきになりましたらご連絡頂けると幸いです
Email:	sakurada@ok.ctrl.titech.ac.jp
sakurada@vision.is.tohoku.ac.jp
2
DynamicFusionの著者紹介
University	of	Washington
• Richard	Newcombe(PhD)
– KinectFusionやDTAMの著者
• Dieter	Fox(Professor)
– “Probabilistic	Robotics”(SLAMの
バイブル本)の著者
• Steven	Seitz	(Professor)
– SfMなどの大御所
3
DynamicFusion
Dense	SLAM	システム
• デプス画像を統合して動的シーンをリアルタイムで
3次元復元
– KinectFusionを動的シーンに拡張
Video:		https://www.youtube.com/watch?v=i1eZekcc_lM	
4
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) Node Distance
(d) Canonical Model (e) Canonical model warped into its live frame (f) Model Normals
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time dense reconstruction of the moving
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical model space into the live frame,
enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSDF reconstruction (d,f). Simulta-
neously, the structure of the warp field is constructed as a set of sparse 6D transformation nodes that are smoothly interpolated through
a k-nearest node average in the canonical frame (c). The resulting per-frame warp field estimate enables the progressively denoised and
KinectFusion
静的シーンを対象としたDense	SLAM	システム
• 複数のデプス画像から密なサーフェスモデルを構築
• 得られたモデルに対して最新のデプス画像を位置合わせし
てカメラ姿勢を推定
“KinectFusion:	Real-Time	Dense	Surface	Mapping	 and	Tracking”		(ISMAR	2011)	
Richard	A.	Newcombe,	Shahram Izadi,	Otmar Hilliges,	David	Molyneaux,	David	Kim,	Andrew	J.	Davison,	
Pushmeet Kohi,	Jamie	Shotton,	Steve	Hodges,	Andrew	Fitzgibbon
KinectFusion: Real-Time Dense Surface Mapping and Tracking⇤
Richard A. Newcombe
Imperial College London
Shahram Izadi
Microsoft Research
Otmar Hilliges
Microsoft Research
David Molyneaux
Microsoft Research
Lancaster University
David Kim
Microsoft Research
Newcastle University
Andrew J. Davison
Imperial College London
Pushmeet Kohli
Microsoft Research
Jamie Shotton
Microsoft Research
Steve Hodges
Microsoft Research
Andrew Fitzgibbon
Microsoft Research
Figure 1: Example output from our system, generated in real-time with a handheld Kinect depth camera and no other sensing infrastructure.
Normal maps (colour) and Phong-shaded renderings (greyscale) from our dense reconstruction system are shown. On the left for comparison
is an example of the live, incomplete, and noisy data from the Kinect sensor (used as input to our system).
5
Video:		https://www.youtube.com/watch?v=quGhaggn3cQ
KinectFusion:	処理の流れ
6
KinectFusion:	カメラの移動量
• カメラ姿勢を6自由度の剛体変換で表現
– ローカルデプスマップをグローバルサーフェスに変換
7
KinectFusion:	デプスマップから頂点と法線への変換
8
• バイラテラルフィルタをデプスマップへ適用(ノイズ低減)
• 頂点(3次元点)
• 3次元点をカメラ座標からワールド座標へ変換
• 単位法線ベクトル
– 隣接ピクセルの3次元点から推定
𝐯" = 𝐷" 𝐮 K'(
𝑢, 𝑣, 1 -
𝐷" 𝐮 :	ピクセル𝐮 = 𝑢, 𝑣 -
のデプス
𝐾:	カメラの内部パラメータ
𝐯 𝒘 = 𝑻 𝒘 𝐯 𝒌
𝐍 𝑥, 𝑦 =
𝐚×𝐛
𝐚×𝐛
KinectFusion:	処理の流れ
9
カメラ移動量とサーフェスの同時推定問題
rom Depth to a Dense Oriented Point Cloud
Raw Depth ICP Outliers
Depth Map
Conversion
Model-Frame
Camera Track
Volumetric
Integration
Model
Rendering
Predicted Vertex
and Normal Maps
(Measured Vertex
and Normal Maps)
(ICP) (TSDF Fusion) (TSDF Raycast)
6DoF Pose and Raw Depth
カメラ移動量から3次元復元可能
10
カメラ移動量が分かると...
11
カメラ移動量が分かると...
12
カメラ移動量が分かると...
13
計測を統合(サーフェス生成)できる...	
14
...また,3次元形状が分かると...
15
...新しいサーフェスの位置合わせができる...
16
...サーフェスの計測誤差を最小化すると...
17
...カメラ姿勢が求まりデプスマップを統合可能...
18
KinectFusion:	サーフェス表現
19
Raw Depth ICP Outliers
Depth Map
Conversion
Model-Frame
Camera Track
Volumetric
Integration
Model
Rendering
Predicted Vertex
and Normal Maps
(Measured Vertex
and Normal Maps)
(ICP) (TSDF Fusion) (TSDF Raycast)
6DoF Pose and Raw Depth
KinectFusion: サーフェス表現
問題点
• デプスマップの計測誤差が大きい
• 〃 に穴(計測値なし)がある
解決方法
• 陰なサーフェス表現を利用
20Reference:	http://slideplayer.com/slide/3892185/#
KinectFusion: サーフェス表現
Truncated	Signed	Distance	Function	(TSDF)
21Reference:	http://slideplayer.com/slide/3892185/#
ボクセルグリッド
KinectFusion: サーフェス表現
22Reference:	http://slideplayer.com/slide/3892185/#
KinectFusion: サーフェス表現
23
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
Reference:	http://slideplayer.com/slide/3892185/#
KinectFusion: サーフェス表現
24Reference:	http://slideplayer.com/slide/3892185/#
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
KinectFusion: サーフェス表現
25Reference:	http://slideplayer.com/slide/3892185/#
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
KinectFusion: サーフェス表現
26Reference:	http://slideplayer.com/slide/3892185/#
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
KinectFusion: サーフェス表現
TSDF 𝐹", 𝑊" の更新 (𝐹":投票値,𝑊":重み)
• 𝑊=>
𝐩 = 1で良い結果が得られる
27
million new point measurements are made per second). Storing
a weight Wk(p) with each value allows an important aspect of the
global minimum of the convex L2 de-noising metric to be exploited
for real-time fusion; that the solution can be obtained incrementally
as more data terms are added using a simple weighted running av-
erage [7], defined point-wise {p|FRk
(p) 6= null}:
Fk(p) =
Wk 1(p)Fk 1(p)+WRk
(p)FRk
(p)
Wk 1(p)+WRk
(p)
(11)
Wk(p) = Wk 1(p)+WRk
(p) (12)
No update on the global TSDF is performed for values resulting
from unmeasurable regions specified in Equation 9. While Wk(p)
provides weighting of the TSDF proportional to the uncertainty of
surface measurement, we have also found that in practice simply
letting WRk
(p) = 1, resulting in a simple average, provides good re-
sults. Moreover, by truncating the updated weight over some value
Wh ,
Wk(p) min(Wk 1(p)+WRk
(p),Wh ) , (13)
million new point measurements are made per second). Storing
a weight Wk(p) with each value allows an important aspect of the
global minimum of the convex L2 de-noising metric to be exploited
for real-time fusion; that the solution can be obtained incrementally
as more data terms are added using a simple weighted running av-
erage [7], defined point-wise {p|FRk
(p) 6= null}:
Fk(p) =
Wk 1(p)Fk 1(p)+WRk
(p)FRk
(p)
Wk 1(p)+WRk
(p)
(11)
Wk(p) = Wk 1(p)+WRk
(p) (12)
No update on the global TSDF is performed for values resulting
from unmeasurable regions specified in Equation 9. While Wk(p)
provides weighting of the TSDF proportional to the uncertainty of
surface measurement, we have also found that in practice simply
letting WRk
(p) = 1, resulting in a simple average, provides good re-
sults. Moreover, by truncating the updated weight over some value
Fk(p) =
Wk 1(p)Fk 1(p)+WRk
(p)FRk
(p)
Wk 1(p)+WRk
(p)
(11)
Wk(p) = Wk 1(p)+WRk
(p) (12)
update on the global TSDF is performed for values resulting
m unmeasurable regions specified in Equation 9. While Wk(p)
ovides weighting of the TSDF proportional to the uncertainty of
face measurement, we have also found that in practice simply
ing WRk
(p) = 1, resulting in a simple average, provides good re-
ts. Moreover, by truncating the updated weight over some value
h ,
Wk(p) min(Wk 1(p)+WRk
(p),Wh ) , (13)
moving average surface reconstruction can be obtained enabling
onstruction in scenes with dynamic object motion.
Although a large number of voxels can be visited that will not
oject into the current image, the simplicity of the kernel means
eration time is memory, not computation, bound and with current
system workflow.
新しい観測現在まで
KinectFusion:	センサーの姿勢推定
点群と面の距離(point-plane	energy)を最小化
28
(Vk 1,Nk 1) which is used in our experimental section for a com-
parison between frame-to-frame and frame-model tracking.
Utilising the surface prediction, the global point-plane energy,
under the L2 norm for the desired camera pose estimate Tg,k is:
E(Tg,k) = Â
u2U
Wk(u)6=null
⇣
Tg,k
˙Vk(u) ˆV
g
k 1 (ˆu)
⌘>
ˆN
g
k 1 (ˆu)
2
, (16)
where each global frame surface prediction is obtained using the
previous fixed pose estimate Tg,k 1. The projective data as-
sociation algorithm produces the set of vertex correspondences
{Vk(u), ˆVk 1(ˆu)|W(u) 6= null} by computing the perspectively pro-
jected point, ˆu = p(KeTk 1,k
˙Vk(u)) using an estimate for the frame-
frame transform eTz
k 1,k = T 1
g,k 1
eTz
g,k and testing the predicted and
measured vertex and normal for compatibility. A threshold on the
distance of vertices and difference in normal values suffices to re-
ject grossly incorrect correspondences, also illustrated in Figure 7:
8
< Mk(u) = 1, and
ns for view-planning
y [Besl92], the ICP
ome the most widely
al shapes (a similar
d Medioni [Chen92]).
1] provide a recent
on the original ICP
McKay [Besl92], each
est point in the other
a point-to-point error
red distance between
mized. The process is
a threshold or it stops
ioni [Chen92] used a
ect of minimization is
point and the tangent
e the point-to-point
n, the point-to-plane
nlinear least squares
dt method [Press92].
ane ICP algorithm is
ion, researchers have
rates in the former
explanation of the
escribed by Pottmann
source points such that the total error between the corresponding
points, under a certain chosen error metric, is minimal.
When the point-to-plane error metric is used, the object of
minimization is the sum of the squared distance between each
source point and the tangent plane at its corresponding destination
point (see Figure 1). More specifically, if si = (six, siy, siz, 1)T
is a
source point, di = (dix, diy, diz, 1)T
is the corresponding destination
point, and ni = (nix, niy, niz, 0)T
is the unit normal vector at di, then
the goal of each ICP iteration is to find Mopt such that
( )( )∑ •−⋅=
i
iii
2
opt minarg ndsMM M (1)
where M and Mopt are 4×4 3D rigid-body transformation matrices.
Figure 1: Point-to-plane error between two surfaces.
tangent
plane
s1
source
point
destination
point
d1
n1
unit
normal
s2
d2
n2
s3
d3
n3
destination
surface
source
surface
l1
l2
l3
Figure	Reference:	Low,	Kok-Lim.	"Linear	least-squares	optimization	for	point-to-plane	icp surface	registration.”
Chapel	Hill,	University	of	North	Carolina (2004).
2つのサーフェス間の誤差
KinectFusion:	センサーの姿勢推定
29
点群と面の距離を最小化
KinectFusion:	センサーの姿勢推定
30
点群と面の距離を最小化
KinectFusion:	センサーの姿勢推定
31
点群と面の距離を最小化
KinectFusion:	センサーの姿勢推定
点と面の対応付けと姿勢の最適化
• 頂点と法線のピラミッドマップを利用
– 𝐿 = 3 つの解像度でデプスと法線のマップを作成
𝐕 𝒍∈ 𝟏…𝑳
,	𝑵𝒍∈ 𝟏…𝑳
• Coarse-to-fine
32Figure	Reference:	http://razorvision.tumblr.com/post/15039827747/how-kinect-and-kinect-fusion-kinfu-work
KinectFusion:	実験結果
33
前後フレーム間でトラッキング
• 累積誤差が発生してドリフト
34
フレーム・モデル間でトラッキング
• グローバルなモデルに対してスキャンマッチング
• ドリフトフリー
• 前後フレーム間のトラッキングより高精度
KinectFusion:	実験結果
KinectFusion:	実験結果
ボクセルの解像度と処理時間
上から順に
• デプスマップの統合
• サーフェス生成のためのレイキャスティング
• ピラミッドマップを利用したカメラ姿勢の最適化
• ピラミッドマップの各スケール間の対応付け
• デプスマップの前処理
35
t
e
)
d Figure 12: A reconstruction result using 1
64 the memory (643 vox-
els) of the previous figures, and using only every 6th sensor frame,
demonstrating graceful degradation with drastic reductions in mem-
ory and processing resources.
Time(ms)
Voxel Resolution
64 128 192 320 448384256 512
33 3 3 3 3 3 3
KinectFusion:	課題
• 長距離軌跡のドリフト
– 明示的なループクロージングが必要
• 十分な幾何拘束が必要
– 例,一枚の平面では3自由度しか拘束されない
• 広域な問題への拡張
– 均一なボクセルではメモリ・計算量が膨大
– 疎な領域が多いため八分木のSDFを利用
36
KinectFusionからDynamicFusionへ
KinectFusionの前提
• 観測シーンは大部分が変化しない
DynamicFusion
• リアルタイム処理を保ちKinectFusionを動的かつ非
剛体なシーンへと拡張
37
KinectFusion: Real-Time Dense Surface Mapping and Tracking⇤
Richard A. Newcombe
Imperial College London
Shahram Izadi
Microsoft Research
Otmar Hilliges
Microsoft Research
David Molyneaux
Microsoft Research
Lancaster University
David Kim
Microsoft Researc
Newcastle Univers
Andrew J. Davison
Imperial College London
Pushmeet Kohli
Microsoft Research
Jamie Shotton
Microsoft Research
Steve Hodges
Microsoft Research
Andrew Fitzgibbon
Microsoft Research
Figure 1: Example output from our system, generated in real-time with a handheld Kinect depth camera and no other sensing infrastru
Normal maps (colour) and Phong-shaded renderings (greyscale) from our dense reconstruction system are shown. On the left for compa
is an example of the live, incomplete, and noisy data from the Kinect sensor (used as input to our system).
ABSTRACT
We present a system for accurate real-time mapping of complex and
arbitrary indoor scenes in variable lighting conditions, using only a
moving low-cost depth camera and commodity graphics hardware.
We fuse all of the depth data streamed from a Kinect sensor into
a single global implicit surface model of the observed scene in
real-time. The current sensor pose is simultaneously obtained by
1 INTRODUCTION
Real-time infrastructure-free tracking of a handheld camera w
simultaneously mapping the physical scene in high-detail pro
new possibilities for augmented and mixed reality application
In computer vision, research on structure from motion (
and multi-view stereo (MVS) has produced many compellin
sults, in particular accurate camera tracking and sparse recon
DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time
Richard A. Newcombe
newcombe@cs.washington.edu
Dieter Fox
fox@cs.washington.edu
University of Washington, Seattle
Steven M. Seitz
seitz@cs.washington.edu
Figure 1: Real-time reconstructions of a moving scene with DynamicFusion; both the person and the camera are moving. The initially
noisy and incomplete model is progressively denoised and completed over time (left to right).
DynamicFusion:	概要
• ノイズの大きい連続デプス画像を入力
• 動的シーンの密な3次元形状をリアルタイムで出力
38
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) N
(d) Canonical Model (e) Canonical model warped into its live frame (f) Mo
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time dense reconstructi
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical model space in
enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSDF reconstructio
neously, the structure of the warp field is constructed as a set of sparse 6D transformation nodes that are smoothly inte
a k-nearest node average in the canonical frame (c). The resulting per-frame warp field estimate enables the progressiv
completed scene geometry to be transformed into the live frame in real-time (e). In (e) we also visualise motion trails
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) Node D
(d) Canonical Model (e) Canonical model warped into its live frame (f) Model N
DynamicFusion: 概要
• ワープフィールドを疎なノードのワープ( 6自由
度)の重み付き平均で表現
• 基準空間の各ボクセルを最新フレームへワープ
39
0s (c) Node Distance
(f) Model Normals
ime dense reconstruction of the moving
nonical model space into the live frame,
gid TSDF reconstruction (d,f). Simulta-
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s
(d) Canonical Model (e) Canonical model warped into its live frame
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time de
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical
enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSD
neously, the structure of the warp field is constructed as a set of sparse 6D transformation nodes that a
a k-nearest node average in the canonical frame (c). The resulting per-frame warp field estimate enable
completed scene geometry to be transformed into the live frame in real-time (e). In (e) we also visuali
of model vertices over the last 1 second of scene motion together with a coordinate frame showing the ri
motion. In (c) we render the nearest node to model surface distance where increased distance is mapped
of objects with both translation and rotation results in signif-
icantly better tracking and reconstruction. For each canoni-
cal point vc 2 S, Tlc = W(vc) transforms that point from
canonical space into the live, non-rigidly deformed frame of
reference.
with each unit dual-quaternio
the k-nearest transformation
R3
7! R defines a weight tha
of each node and SE3(.) con
an SE(3) transformation mat
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) Node Di
(d) Canonical Model (e) Canonical model warped into its live frame (f) Model No
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time dense reconstruction of t
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical model space into the
補間疎なノード
ワープフィールド𝒲Jを推定
DynamicFusion: 概要
TSDFは最新フレームの空間で統合
• 最新フレームのレイが基準空間では歪曲
40
Non-rigid scene deformation Introducing an occlusion
(a) Live frame t = 0 (b) Live Frame t = 1 (c) Canonical 7! Live (d) Live frame t = 0 (e) Live Frame t = 1 (f) Canonical 7! Live
Figure 3: An illustration of how each point in the canonical frame maps, through a correct warp field, onto a ray in the live camera frame
when observing a deforming scene. In (a) the first view of a dynamic scene is observed. In the corresponding canonical frame, the warp is
initialized to the identity transform and the three rays shown in the live frame also map as straight lines in the canonical frame. As the scene
deforms in the live frame (b), the warp function transforms each point from the canonical and into the corresponding live frame location,
causing the corresponding rays to bend (c). Note that this warp can be achieved with two 6D deformation nodes (shown as circles), where
the left node applies a clockwise twist. In (d) we show a new scene that includes a cube that is about to occlude the bar. In the live frame
DynamicFusion:	幾何表現
Truncated	Signed	Distance	Function	(TSDF)
256N
のボクセルの場合,
• Kinect	Fusion
– カメラ姿勢(6パラメータ)のみ推定
• DynamicFusion
– 1フレームごとに6×256N
パラメータを推定
– KinectFusionの約1000万倍
全ボクセルのワープフィールド推定は困難
41
DynamicFusion:	密な非剛体ワープフィールド
ワープ関数
ベース:	疎なノードの変換
𝑛	個の変換ノード 𝒩𝐰𝐚𝐫𝐩
J
= 𝐝𝐠V, 𝐝𝐠 𝓌, 𝐝𝐠XYN
𝐝𝐠V
Z
∈ ℝN
:	基準空間における位置
𝐝𝐠XYN
Z
= TZ] :	ノード 𝑖の(座標)変換パラメータ
𝐝𝐠 𝓌
Z
: 放射基底重み(ノードの影響半径)
座標におけるノードの影響度合い
𝒘𝒊 𝑥 𝒄 = 𝐞𝐱𝐩 − 𝐝𝐠V
Z
− 𝑥 𝒄
𝟐
/ 2 𝐝𝐠 𝓌
Z e
𝐒 :	基準空間 𝑥 𝒄 ∈ 𝐒 :	各ボクセルの中心 42
DynamicFusion:	密な非剛体ワープフィールド
ワープ関数
補間:	密なボクセルのワープ関数
𝒲 𝑥] ≡ 𝑆𝐸3 𝐃𝐐𝐁 𝑥]
𝐃𝐐𝐁 𝒙 𝒄 =
∑ 𝐰" 𝑥] 𝐪p"]"∈q rs
∑ 𝐰" 𝑥] 𝐪p"]"∈q rs
単位デュアルクォータニオン:	𝐪p"] ∈ ℝt
𝑆𝐸3 . : デュアルクォータニオンを3次元ユークリッド
空間の座標変換行列に変換
43
予備知識:	デュアルクォータニオン
クォータニオン (William	Rowan	Hamilton,1843)
• 回転のみ
𝐪 = cos
𝜃
2
+ 𝑢r 𝐢 + 𝑢| 𝐣 + 𝑢~ 𝐤 sin
𝜃
2
𝒊 𝟐
= 𝒋 𝟐
= 𝒋 𝟐
= 𝒊𝒋𝒌 = −𝟏
単位ベクトル:			𝐮 = 𝑢r 𝐢 + 𝑢| 𝐣 + 𝑢~ 𝐤
• 3次元点 𝐩 = 𝑝r 𝐢 + 𝑝| 𝐣 + 𝑝~ 𝐤 を回転
𝐩„
= 𝐪𝐩𝐪'𝟏
44
予備知識:	デュアルクォータニオン
デュアルクォータニオン(William	Kingdon Clifford,	1873)
• 回転 +	並進
𝐪̇ = 𝐪† + 𝜖𝐪ˆ									𝜖e
= 0
𝐩„
= 𝐪̇ 𝐩𝐪̇ '𝟏
																												
回転
𝐪† = 𝑟‹ + 𝑟ri +𝑟|j +𝑟~ 𝒌 = cos
Œ
e
+ sin
Œ
e
• 𝐮
並進
𝐪ˆ = 0 +
Žr
e
𝒊 +
Ž|
e
𝒋 +
Ž~
e
𝒌
45
予備知識:	Dual	Quaternion	Blending	(DQB)
Linear	Blending	Skinning	(LBS)
𝐩𝐢• = • 𝑤Z’ 𝑇’ 𝐩𝐢
”
’•(
• 体積縮小問題
– スケール変化が原因
46
Reference:	Kavan,	Ladislav,	et	al.	"Skinning	with	dual	quaternions."	Proceedings	of	the	2007	
symposium	on	Interactive	3D	graphics	and	games.	ACM,	2007.
予備知識:	Dual	Quaternion	Blending	(DQB)
Dual	Quaternion	Skinning	(DQS)
𝐪̇ =
∑ 𝑤Z 𝐪̇ 𝐢
”
’•(
∑ 𝑤Z 𝐪̇ 𝐢
”
’•(
47
Reference:	Kavan,	Ladislav,	et	al.	"Skinning	with	dual	quaternions."	Proceedings	of	the	2007	
symposium	on	Interactive	3D	graphics	and	games.	ACM,	2007.
LBS DQS
ワープフィールド
𝒲J 𝑥] = T–‹ 𝑆𝐸3 𝐃𝐐𝐁 𝑥]
DynamicFusion:	密な非剛体ワープフィールド
48
剛体変換
(カメラの移動量)
ボクセル(ノード)ごとの
ワープフィールド
DynamicFusion:	密な非剛体サーフェスの統合
Sampled	TSDF
𝒱 𝐱 ↦ v 𝐱 ∈ ℝ,w 𝐱 ∈ ℝ
v 𝐱 :	全projective	TSDF	値の重み付き平均
w 𝐱 :	 𝐱	に関する重みの総和
49
DynamicFusion:	密な非剛体サーフェスの統合
最新のデプス画像 取得後
• 基準空間の各ボクセル中心を最新フレームの座標
系に変換
• Projective	Signed	Distance	Function	(psdf)
:	ボクセル中心の投影画素
50
デプスの計測点 ボクセル中心
𝐩𝐬𝐝𝐟(𝑥])	= K'(
𝐷J 𝑢] 𝑢]
-
, 1 -
~ − 𝑥J ~
𝑥J
-
, 1
-
= 𝒲J 𝑥] 𝑥]
-
, 1
-
𝑢] = 𝜋 K𝑥J
DynamicFusion:	密な非剛体サーフェスの統合
TSDF値の更新
𝒱 𝐱 J =
v′ 𝐱 ,w′ 𝐱 -
, 	if	𝐩𝐬𝐝𝐟 𝐝𝐜 𝐱 > −𝜏
𝒱 𝐱 J'(, 												otherwise																				
						𝐝𝐜 . :	(ボクセルID) →(ボクセル中心の座標(TSDF領域))
𝜏 > 0:	打ち切り(Truncate)する距離の閾値
v„
𝐱 =
« 𝐱 𝒕-𝟏® 𝐱 𝒕-𝟏¯°±² ³,´ ‹ 𝐱
® 𝐱 𝒕-𝟏¯‹ 𝐱
								
𝜌 = 𝐩𝐬𝐝𝐟 𝐝𝐜 𝐱 																						
𝑤„ 𝐱
= min w 𝐱 𝒕'𝟏 + 𝑤 𝐱 , 𝑤°·¸
𝑤 𝐱 ∝
(
"
∑ 𝐝𝐠‹
Z
− 𝑥] e
													Z∈q rs
51
DynamicFusion:	ワープフィールド𝒲Jの推定
エネルギー関数
𝐸 𝒲J, 𝒱, 𝐷J, ℰ = 𝐃𝐚𝐭𝐚 𝒲J, 𝒱, 𝐷J + 𝜆𝐑𝐞𝐠 𝒲J, ℰ
52
モデルから最新フレーム
への密なICPコスト
非スムースなモーションフィー
ルドに対するペナルティ
𝒱: 現在の3次元形状
𝐷J:	最新のデプスマップ
ℰ :	エッジの集合
DynamicFusion:	ワープフィールド𝒲Jの推定
現在のサーフェスモデル
• TSDF	𝒱からマーチングキューブアルゴリズムで抽出
– 0の等値面から表面形状を生成
• 基準空間におけるポリゴンメッシュとして保持
– 頂点と法線のペア: 𝒱¾] ≡ 𝑉], 𝑁]
53
DynamicFusion:	ワープフィールド𝒲Jの推定
データ項
𝐃𝐚𝐭𝐚 𝒲, 𝒱, 𝐷J ≡ • 𝜓 𝐝𝐚𝐭𝐚 𝐧pÃ
-
𝐯ÄÃ − 𝐯𝐥ÃÆ
Ã∈Ç
𝜓 𝐝𝐚𝐭𝐚:	Robust	Tukey penalty	function
54
頂点と法線の推定値(基準空間
から最新フレームの空間に変換)
計測点(デプス)
最新フレームにおける
頂点の法線方向の誤差
• This gives the “solution” as a simple least-squares problem:
ˆa =
i
wixix⊤
i
−1
i
wiyixi. (8)
Note that this solution is depends on the wi values which in turn depend
on ˆa.
• The idea is to alternate calculating ˆa and recalculating wi = w((yi −
ˆa⊤
xi)/σi).
• Here are the weight functions associated with the two estimates. For the
Cauchy ρ function,
wC(u) =
u
1 + (u/c)2
(9)
0.2
0.4
0.6
0.8
1
–6 –4 –2 0 2 4 6
and, for the Beaton-Tukey ρ function,
Beaton-Tukey 𝜌 function
0.2
0.4
0.6
0.8
1
–6 –4 –2 0 2 4 6
d, for the Beaton-Tukey ρ function,
wT (u) =
1 − u
a
2 2
|u| ≤ a
0 |u| > a
. (10)
Beaton-Tukey 𝜌 functionの例
DynamicFusion:	ワープフィールド𝒲Jの推定
正則化項
• 最新フレームで計測されない箇所のモーションを制約
𝐑𝐞𝐠 𝒲,ℰ ≡ • • 𝛼Z’ 𝜓𝐫𝐞𝐠 𝐓𝒊𝒄 𝐝𝐠V
’
− 𝐓𝒋𝒄 𝐝𝐠V
’
’∈ℰ Z
”
Z•Ê
𝜓𝐫𝐞𝐠:	Huber	penalty
𝛼Z’ = max 𝐝𝐠 𝓌
Z
, 𝐝𝐠 𝓌
’
55
ノード 𝑖 と 𝑗のエッジができる
だけ剛体を保つ制約
𝐿Î 𝑎 =
1
2
𝑎e																		 𝑎 ≤ 𝛿
𝛿 𝑎 −
1
2
𝛿 , otherwise
Huber	loss	function
Huber	loss
Squared	error	loss
Huber	loss	functionの例
DynamicFusion:	ワープフィールド𝒲Jの推定
56
エネルギー関数𝐸を最小化
• ガウス・ニュートン法
ヘッシアン: 𝐉-
𝐉 = 𝐉 𝒅
-
𝐉 𝒅 + λ𝐉 𝒓
-
𝐉 𝒓
– ブロックarrow-head行列
– ブロックコレスキー分解で効率的に計算
Arrow-head行列
DynamicFusion:	新しいノードの挿入
• 支持されてないサーフェスの頂点
min"∈q rs
𝐝𝐠V
"
− 𝑣]
𝐝𝐠 𝓌
"
≥ 1
• 新ノード 𝐝𝐠V
∗
∈ 𝐝𝐠ØV
– DQBを利用して周囲のノードから新ノードの変換
パラメータを初期化
𝐝𝐠XYN
∗
← 𝒲J 𝐝𝐠V
∗
• 更新
𝒩𝐰𝐚𝐫𝐩
J
= 𝒩𝐰𝐚𝐫𝐩
J'(
∪ 𝐝𝐠ØV, 𝐝𝐠ØXYN, 𝐝𝐠Ø 𝓌
57
サーフェスの頂点
ノードの座標と影響半径
DynamicFusion:	実験結果
58
Canonical Model for “drinking from a cup”
(a) Canonical model warped into the live frame for “drinking from a cup”
Canonical Model for “Crossing fingers”
DynamicFusion:	実験結果
59
Canonical Model for “drinking from a cup”
(a) Canonical model warped into the live frame for “drinking from a cup”
Canonical Model for “Crossing fingers”
(b) Canonical model warped into the live frame for “crossing fingers”
Figure 5: Real-time non-rigid reconstructions for two deforming scenes. Upper rows of (a) and (b) show the canonical models as they
evolve over time, lower rows show the corresponding warped geometries tracking the scene. In (a) complete models of the arm and the
cup are obtained. Note the system’s ability to deal with large motion and add surfaces not visible in the initial scene, such as the bottom of
the cup and the back side of the arm. In (b) we show full body motions including clasping of the hands where we note that the model stays
consistent throughout the interaction.
tracking scenes with more fluid deformations than shown
in the results, but the long term stability can degrade and
tracking will fail when the observed data term is not able to
5. Conclusions
DynamicFusion:	課題
• グラフ構造の急激な変化に弱い
– トポロジー的に閉じている状態から開いてる状態
へ急に遷移すると失敗し易い
– 結合しているノードは急に分離できない
• リアルタイム差分トラッキングに共通の失敗
– ループクロージングの失敗
– フレーム間の大きすぎる移動
60
まとめ
• ダイナミックシーンの密なリアルタイム3次元
復元手法を提案
– シーンの静的仮定なし
• 3次元TSDFの統合を非剛体に一般化
• 3次元(6自由度)のワープフィールドをリアル
タイムで推定
61
END
62

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ
 
点群深層学習 Meta-study
点群深層学習 Meta-study点群深層学習 Meta-study
点群深層学習 Meta-study
 
PCL
PCLPCL
PCL
 
NDTスキャンマッチング 第1回3D勉強会@PFN 2018年5月27日
NDTスキャンマッチング 第1回3D勉強会@PFN 2018年5月27日NDTスキャンマッチング 第1回3D勉強会@PFN 2018年5月27日
NDTスキャンマッチング 第1回3D勉強会@PFN 2018年5月27日
 
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video  Processing (NeRF...[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video  Processing (NeRF...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
 
確率モデルを用いた3D点群レジストレーション
確率モデルを用いた3D点群レジストレーション確率モデルを用いた3D点群レジストレーション
確率モデルを用いた3D点群レジストレーション
 
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
 
3次元レジストレーション(PCLデモとコード付き)
3次元レジストレーション(PCLデモとコード付き)3次元レジストレーション(PCLデモとコード付き)
3次元レジストレーション(PCLデモとコード付き)
 
Open3DでSLAM入門 PyCon Kyushu 2018
Open3DでSLAM入門 PyCon Kyushu 2018Open3DでSLAM入門 PyCon Kyushu 2018
Open3DでSLAM入門 PyCon Kyushu 2018
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)
 
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
 
20090924 姿勢推定と回転行列
20090924 姿勢推定と回転行列20090924 姿勢推定と回転行列
20090924 姿勢推定と回転行列
 
ORB-SLAMの手法解説
ORB-SLAMの手法解説ORB-SLAMの手法解説
ORB-SLAMの手法解説
 
【CVPR 2019】DeepSDF: Learning Continuous Signed Distance Functions for Shape R...
【CVPR 2019】DeepSDF: Learning Continuous Signed Distance Functions for Shape R...【CVPR 2019】DeepSDF: Learning Continuous Signed Distance Functions for Shape R...
【CVPR 2019】DeepSDF: Learning Continuous Signed Distance Functions for Shape R...
 
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用
 
[DL輪読会]Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neu...
[DL輪読会]Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neu...[DL輪読会]Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neu...
[DL輪読会]Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neu...
 
【DL輪読会】HexPlaneとK-Planes
【DL輪読会】HexPlaneとK-Planes【DL輪読会】HexPlaneとK-Planes
【DL輪読会】HexPlaneとK-Planes
 

Semelhante a 論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real-­‐Time"

Lossless image compression via by lifting scheme
Lossless image compression via by lifting schemeLossless image compression via by lifting scheme
Lossless image compression via by lifting scheme
Subhashini Subramanian
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic..."What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
Edge AI and Vision Alliance
 

Semelhante a 論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real-­‐Time" (20)

Photoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectorsPhotoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectors
 
Colored inversion
Colored inversionColored inversion
Colored inversion
 
Presnt3
Presnt3Presnt3
Presnt3
 
reportVPLProject
reportVPLProjectreportVPLProject
reportVPLProject
 
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
 
Lossless image compression via by lifting scheme
Lossless image compression via by lifting schemeLossless image compression via by lifting scheme
Lossless image compression via by lifting scheme
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
PAP245gauss
PAP245gaussPAP245gauss
PAP245gauss
 
Network sampling and applications to big data and machine learning
Network sampling and applications to big data and machine learningNetwork sampling and applications to big data and machine learning
Network sampling and applications to big data and machine learning
 
TransNeRF
TransNeRFTransNeRF
TransNeRF
 
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
 
D143136
D143136D143136
D143136
 
DICTA 2017 poster
DICTA 2017 posterDICTA 2017 poster
DICTA 2017 poster
 
Computer Graphics
Computer GraphicsComputer Graphics
Computer Graphics
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic..."What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
 
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism HardwarePerformance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
 
Distortion Correction Scheme for Multiresolution Camera Images
Distortion Correction Scheme for Multiresolution Camera ImagesDistortion Correction Scheme for Multiresolution Camera Images
Distortion Correction Scheme for Multiresolution Camera Images
 
Kk2518251830
Kk2518251830Kk2518251830
Kk2518251830
 
Kk2518251830
Kk2518251830Kk2518251830
Kk2518251830
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real-­‐Time"