On National Teacher Day, meet the 2024-25 Kenan Fellows
Fcv poster bao
1. Semantic Structure From Motion
Sid Yingze Bao and Silvio Savarese
VISION LAB Electrical and Computer Engineering, University of Michigan at Ann Arbor
Source code and data: http://www.eecs.umich.edu/vision/projects/ssfm/index.html
Introduction Model Overview Results
Goal: Assumption: Car Person
{O,Q,C}=arg max P(q,u,o|C,O,Q) Comparison Baselines
Office
Estimate 3D location and pose of objects, 3D location Given camera hypothesis, objects
=arg max P(q,u|C,Q)P(o|C,O) - Camera Pose Est.: Bundler [1]
of points, and camera parameters from 2 or more images. and points are independent
- Object Detection: LSVM [2]
Point Likelihood P(q,u|C,Q)
1. Car Dataset [3] (available online)
- Images and Dense Lidar Points
- ~500 testing images in 10 scenarios
Motivation: Object Likelihood P(o|C,O) Cam. T. est. error v.s. baseline 3D object localization
0.3
- Most 3D reconstruction methods do 40
e (degree)
T 0.25
Recall
- Estimate 3D object likelihood by 2D Bundler 0.2 4 Cam
not povide semantic information. SSFM 0.15 3 Cam
projection appearance: 20
0.1 2 Cam
- Most recognition methds do not Conventional 0.05
1 Cam
False Positive per Image
0
provide geometry and camera pose. Structure From Motion 1 2 3 4
0
0 2 4 6 8 10
- We propose to solve these two problem jointly. 2D object detection
Advantages:
- Improve camera pose estimation, compared to feature-point-based SFM.
- Improve object detections given multiple images, compared to independently
detecting objects from each single images.
- Establish object correspondences across views.
SSFM Problem Formulation 2. Kinect Office Dataset (available online) Cam. T. est. error v.s. baseline
20 e (degree)
3D object localization
Q
- Images and calibrated Kinect 3D range data T Recall
2 Cam
1 Cam
Bundler
Measurements - Mouse, Monitor, and Keyboard 10
- q: point features (e.g. DOG+SIFT) - 500 images in 10 scenarios 0
SSFM False Positive per Image
O 0.4 0.6 0.8 1
- u: point matches (e.g. threshold test)
- o: 2D objects (e.g. [2])
o
Joint Likelihood Maximization
Main challenge: image 1 object det. image 2 object det.
Model Parameters (unknowns) High dimensionality of unknowns =>
q
azimuth=20
- C: camera (K is known)
zenith =10
C
o q Sample P(q,u,o|C,O,Q) with MCMC azimuth=90
zenith =5
- Q: 3D points (locations) azimuth=-20
zenith =9
azimuth=70
Parameter Initialization
zenith =10
- O: 3D objects (locations, poses, categories) 3. Person Dataset
C - Use object detection scale and pose to one of camera - A pair of stereo cameras
Intuition: initialize cameras relative poses pose initializations
- 400 image pairs in 10 scenarios
In addition to point features, measurements of objects across views provide add- - Theorem: camera parameters can be
itional geometrical constraints that allow to relate cameras and scene parameters. estimated given:
i) 3 objects with scale; ii) 2 objects with
pose; iii) 1 object with scale and pose.
Reference Monte Carlo Markov Chain
[1] N. Snavely, S. M. Seitz, and R. S. Szeliski. Modeling the world from internet photo collections. IJCV. 2008. - Sampling starts from different initializations
[2] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained
- Proposal distribution P(q,u,o|C,O,Q) Ackowledgement
part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence of Pattern Analysis, 2009. one of camera poses and objects initializations
[3] Gaurav Pandey, James McBride,,and Ryan Eustice,.Ford campus vision and lidar data set. International Journal - Combine all samples to identify the maximum We acknowledge the support of NSF CAREER #1054127 and the Gigascale Systems Research Center.
of Robotics Research. 2011 We thank Mohit Bagra for collecting the Kinect dataset and Min Sun for helpful feedback.