51. 参考文献
51
[Lowe1999]Lowe, D. G. (1999). Object recognition from local scale-
invariant features. In IEEE International Conference on ComputerVision
(pp. 1150–1157 vol.2).
[Csurka2004]Csurka, G., Dance, C. R., Fan, L.,Willamowski, J., & Bray,
C. (2004).Visual categorization with bags of keypoints. In Workshop
on statistical learning in computer vision, ECCV (Vol. 1, p. 22).
[Lazebnik2006]Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recognizing natural
scene categories. In IEEE Conference on ComputerVision and Pattern
Recognition.
[Perronnin2007]Perronnin, F., & Dance, C. (2007). Fisher kernels on
visual vocabularies for image categorization. In IEEE conference on
ComputerVision and Pattern Recognition.
[Jegou2010]Jegou, H., Douze, M., Schmid, C., & Perez, P. (2010).
Aggregating local descriptors into a compact image representation.
In IEEE Conference on ComputerVision and Pattern Recognition
52. 参考文献
52
[Krizhevsky2012]Krizhevsky,A., Sutskever, I., & Hinton, G. E.
(2012). ImageNet Classification with Deep Convolutional
Neural Networks. In Advances in Neural Information Processing
Systems (NIPS)
[Simonyan2014]Simonyan, K., & Zisserman,A. (2014).Very
Deep Convolutional Networks for Large-Scale Image
Recognition. In IEEE Conference on ComputerVision and Pattern
Recognition.
[Szegedy2015]Szegedy, C., Liu,W., Jia,Y., Sermanet, P., Reed, S.,
Anguelov, D., … Rabinovich,A. (2015). Going Deeper with
Convolutions. Conference on ComputerVision and Pattern
Recognition
[He2016]He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep
Residual Learning for Image Recognition. IEEE Conference on
ComputerVision and Pattern Recognition.
69. 参考文献
69
[Viola2001]Viola, P., & Jones, M. (2001). Rapid object detection
using a boosted cascade of simple features. IEEE International
Conference on ComputerVision and Pattern Recognition (CVPR).
[Dalal2005]Dalal, N., &Triggs, B. (2005). Histograms of
Oriented Gradients for Human Detection. IEEE Conference on
ComputerVision and Pattern Recognition (CVPR).
[Felzenswalb2009]Felzenszwalb, P. F., Girshick, R. B., McAllester,
D., & Ramanan, D. (2009). Object detection with
discriminatively trained part-based models. IEEETransactions on
Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
[Girshick2014] Girshick, R., Donahue, J., Darrell,T., & Malik, J.
(2014). Rich feature hierarchies for accurate object detection
and semantic segmentation. In IEEE Conference on Computer
Vision and Pattern Recognition.
70. 参考文献
70
[Girshick2015] Girshick, R. (2015). Fast R-CNN. International
Conference on ComputerVision, 1440–1448.
[Ren2015] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster
R-CNN:Towards Real-Time Object Detection with Region
Proposal Networks. Advances in Neural Information Processing
Systems (NIPS).
[Redmon2015]Redmon, J., Divvala, S., Girshick, R., & Farhadi,A.
(2015).You Only Look Once: Unified, Real-Time Object
Detection. Conference on ComputerVision and Pattern Recognition.
[Liu2016]Liu,W.,Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.Y., & Berg,A. C. (2016). SSD: Single shot multibox
detector. In IEEE Europian Conference on ComputerVision.
71. 参考文献
71
[Law2018]Law, H., & Deng, J. (2018). CornerNet:
Detecting Objects as Paired Keypoints. In IEEE Europian
Conference on ComputerVision.
[Zhou2019]Zhou, X.,Wang, D., & Krähenbühl, P. (2019).
Objects as Points. ArXiv, arXiv:1904.
[Duan2019]Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., &
Tian, Q. (2019). CenterNet: Keypoint triplets for object
detection. In IEEE International Conference on Computer
Vision
91. 参考文献
[Thoma2016] Matin Thoma,“A Suvey of Semantic
Segmentation”, arXiv:1602.06541v2
[He2004] He, X., Zemel, R. S., & Carreira-Perpiñán, M. Á.
(2004). Multiscale conditional random fields for image labeling.
In IEEE Conference on ComputerVision and Pattern Recognition.
[Shotton2009] Shotton, J.,Winn, J., Rother, C., & Criminisi,A.
(2009).TextonBoost for image understanding: Multi-class
object recognition and segmentation by jointly modeling
texture, layout, and context. International Journal of Computer
Vision, 81(1), 2–23.
[Krahenbuhl2011] Krahenbuhl, P., & Koltun,V. (2011). Efficient
Inference in Fully Connected CRFs with Gaussian Edge
Potentials. Advances in Neural Information Processing Systems
(NIPS).
92. 参考文献
[Long2015] Long, J., Shelhamer, E., & Darrell,T. (2015). Fully
Convolutional Networks for Semantic Segmentation. In IEEE
Conference on ComputerVision and Pattern Recognition.
[Zheng2015] Zehng, S., Jayasumana, S., Romera-Paredes, B.,
Vineet,V., Su, Z., Du, D., …Torr, P. H. S. (2015). Conditional
Random Fields as Recurrent Neural Networks. In IEEE
Conference on ComputerVision and Pattern Recognition.
[Noh2015] Noh, H., Hong, S., & Han, B. (2015). Learning
deconvolution network for semantic segmentation. In IEEE
International Conference on ComputerVision.
[Ronneberger2015]Ronneberger, O., Fischer, P., & Brox,T.
(2015). U-Net: Convolutional networks for biomedical image
segmentation. International Conference on Medical Image
Computing and Computer-Assisted Intervention.
93. 参考文献
[Yu2016]Yu, F., & Koltun,V. (2016). Multi-Scale Context
Aggregation by Dilated Convolutions. International
Conference on Machine Learning
[Chen2017]Chen, L.-C., Papandreou, G., Schroff, F., &
Adam, H. (2017). Rethinking Atrous Convolution for
Semantic Image Segmentation. ArXiv, arXiv:1706.
[Zhao2017]Zhao, H., Shi, J., Qi, X.,Wang, X., & Jia, J. (2017).
Pyramid Scene Parsing Network. In IEEE Conference on
ComputerVision and Pattern Recognition.
110. Building Rome in a Day [Agarwal2009]
110
15万件のインターネット上の画像から1都市を500コアの
クラスタで1日かからずに構築。
https://www.youtube.com/watch?v=sQegEro5Bfo
111. Building Rome in a Day [Agarwal2009]
111
15万件のインターネット上の画像から1都市を500コアの
クラスタで1日かからずに構築。
112. Building Rome in a Cloudless Day
[Frahm2010]
112
300万枚の画像から、密な三次元モデルを1台のPC
(+GPU)で約1日で構築
Credit:[Frahm2010]
https://www.youtube.com/watch?v=PySBQ8Q_R8k
113. Building Rome in a Cloudless Day
[Frahm2010]
113
300万枚の画像から、密な三次元モデルを1台のPC
(+GPU)で約1日で構築
114. Visual SLAM
114
Structure from Motionの仕組みを利用して、カメラの動き
と3次元空間を同時に認識し、拡張現実感(AR)などに活
用
Simultaneous Localization And Mapping (SLAM)
Localization
Mapping
139. 参考文献
139
[Snavely2006]Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo
tourism: exploring photo collections in 3D. In Conference on
Computer Graphics and InteractiveTechniques (SIGGRAPH).
[岡谷2010]岡谷貴之. (2010). コンピュータビジョン最先端ガイ
ド3 第1章バンドル調整. アドコムメディア. 1-32
[古川2012]古川泰隆. (2012). コンピュータビジョン最先端ガイ
ド5 第2章複数画像からの三次元復元手法. アドコムメディア.
33-70
[Agarwal2009]Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., &
Szeliski, R. (2009). Building Rome in a day. In International
Conference on ComputerVision (pp. 72–79).
[Frahm2010]Frahm, J., Fite-georgel, P., Gallup, D., Johnson,T.,
Raguram, R.,Wu, C., … Pollefeys, M. (2010). Building Rome on a
Cloudless Day. In European Conference on ComputerVision (pp.
368–381)
140. 参考文献
140
[Mur-Artal2015]Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015).
ORB-SLAM:AVersatile and Accurate Monocular SLAM System. IEEE
Transactions on Robotics, 31(5), 1147–1163.
[Rublee2011]Rublee, E., Rabaud,V., Konolige, K., & Bradski, G. (2011).
ORB:An efficient alternative to SIFT or SURF. 2011 International
Conference on ComputerVision
[Newcombe2011]Newcombe, R.A., Lovegrove, S. J., & Davison,A. J.
(2011). DTAM: Dense Tracking and Mapping in Real-Time. In
International Conference on ComputerVision.
[Engel2014]Engel, J., Schops,T., & Cremers, D. (2014). LSD-SLAM:
Large-Scale Direct monocular SLAM. In European Conference on
ComputerVision
[Godard2017] Godard, C., Mac Aodha, O., & Brostow, G. J. (2017).
Unsupervised Monocular Depth Estimation with Left-Right
Consistency. Conference on ComputerVision and Pattern Recognition
141. 参考文献
141
[Tateno2017]Tateno, K.,Tombari, F., Laina, I., & Navab, N. (2017). CNN-
SLAM : Real-time dense monocular SLAM with learned depth prediction. In
IEEE Conference on ComputerVision and Pattern Recognition.
[Zhou2017]Zhou,T., Brown, M., Snavely, N., & Lowe, D. G. (2017).
Unsupervised learning of depth and ego-motion from video. In IEEE
Conference on ComputerVision and Pattern Recognition
[Bloesch2018]Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., &
Davison,A. J. (2018). CodeSLAM — Learning a Compact, Optimisable
Representation for DenseVisual SLAM. In IEEE Conference on Computer
Vision and Pattern Recognition.
[Tang2019]Tang, C., &Tan, P. (2019). BA-Net: Dense Bundle Adjustment
Network. In International Conference on Learning Representation.
[Gordon2019]Gordon,A., Li, H., Jonschkowski, R., & Angelova,A. (2019).
Depth from videos in the wild: Unsupervised monocular depth learning
from unknown cameras. IEEE International Conference on ComputerVision
148. 参考文献
148
[Agarwala2004]Agarwala,A., Dontcheva, M.,Agrawala, M., Drucker, S.,
Colburn,A., Curless, B., … Cohen, M. (2004). Interactive digital
photomontage. In Conference on Computer Graphics and InteractiveTechniques
(SIGGRAPH) (Vol. 23).
[Chen2009]Chen,T., Cheng, M.-M.,Tan, P., Shamir,A., & Hu, S.-M. (2009).
Sketch2Photo: internet image montage. In Conference on Computer Graphics
and InteractiveTechniques (SIGGRAPH).
[Radford2016]Radford,A., Metz, L., & Chintala, S. (2016). Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial
Networks. International Conference on Learning Representation.
[Gatys2016]Gatys, L.A., Ecker,A. S., & Bethge, M. (2016). Image Style
Transfer Using Convolutional Neural Networks. In IEEE Conference on
ComputerVision and Pattern Recognition.
[Isola2017]Isola, P., Zhu, J.Y., Zhou,T., & Efros,A.A. (2017). Image-to-image
translation with conditional adversarial networks. IEEE Conference on
ComputerVision and Pattern Recognition.
149. 参考文献
149
[Blanz1999] Blanz,V., &Vetter,T. (1999).A morphable model for the
synthesis of 3D faces. In Conference on Computer Graphics and
InteractiveTechniques (SIGGRAPH) (pp. 187–194).
[Hoiem2005]Hoiem, D., & Efros,A.A. (2005).Automatic photo pop-
up. In Conference on Computer Graphics and InteractiveTechniques
(SIGGRAPH).
[Tran2018]Tran, L., & Liu, X. (2018). Nonlinear 3D Face Morphable
Model. IEEE Conference on ComputerVision and Pattern Recognition.
[Kato2018]Kato, H., Ushiku,Y., & Harada,T. (2018). Neural 3D Mesh
Renderer. In IEEE Conference on ComputerVision and Pattern
Recognition.
[Saito2019]Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., &
Kanazawa,A. (2019). PIFu: Pixel-aligned implicit function for high-
resolution clothed human digitization. IEEE International Conference
on ComputerVision.