O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Computer Vision Crash Course

Computer vision techniques can be seen in various aspects in our daily life with tremendous impacts. This slides aim at introducing basic concepts of computer vision and applications for the general public.

Download link: https://uofi.box.com/shared/static/24vy7aule67o4g6djr83hzurf5a9lfp6.pptx

  • Entre para ver os comentários

Computer Vision Crash Course

  1. 1. Computer Vision Crash Course Jia-Bin Huang Electrical and Computer Engineering Virginia Tech
  2. 2. Today’s agenda 14:00 - 15:20 • A little about me • Introduction to computer vision 15:20 - 15:40 • Coffee break 15:40 - 17:00 • Fundamentals of computer vision • Recent advances
  3. 3. What this crash course is about? • Learn cool applications in computer vision • Understand the fundamentals • light, geometry, matching, alignment, grouping, recognition • Pointers to explore further
  4. 4. WARNING The following contains shameless self-promotion, and may cause audience to groan, scoff, or ill. Consider reading at your own risk
  5. 5. About me 7 months old!
  6. 6. National Chiao-Tung University B.S. in EE IIS, Academia Sinica Research Assistant UIUC Ph.D. in ECE 2016 UC, Merced Visiting Student Microsoft Research Research Intern 2012, 2013 Disney Research Research Intern 2014
  7. 7. Quick facts • Faculty members x 86 • IEEE Fellows x 31 • NAE Members x 4 • $34+ Million in annual research • Rank 19-21 (US News) Area of Focus • Computer Systems • Networking • VLSI and Design Automation • Software and Machine Intelligence
  8. 8. Vision and Learning Lab @ Virginia Tech Jinwoo Choi PhD student Badour AlBahar MS student Hao-Wei Yeh (Univ. Tokyo) Yen-Chen Lin (NTHU) Graduate students Visiting students (Summer 2017) Open positions for PhD students and visiting students! ? Alumni Former undergrad students • [Le, Zelun, JunYoung] Stanford x 3 • [Linjia, Sakshi, Danyang] UIUC x 3 • [Kevin] UC Berkeley x 1 • [Anarghya] Google x 1 Former student collaborators • [Shun Zhang] Assistant Prof. at Northwestern Polytechnical Univ. • [Chao Ma] Postdoc at University of Adelaide • [Dong Li] Research Scientist at Face++
  9. 9. What we do? Vision/learning/graphics • Unsupervised learning • Deep generative/predictive modeling • Computational photography Applied, interdisciplinary research • Visual sensing for ecology and conservation • Interactive neurorehabilitation
  10. 10. Image Completion [SIGGRAPH14] - Revealing unseen pixels
  11. 11. Video Completion [SIGGRAPH Asia16] - Revealing temporally coherent pixels
  12. 12. Image super-resolution [CVPR15] - Revealing unseen high frequency details
  13. 13. Deep Joint Image Filtering [ECCV16] - Transferring structural details Depth upsampling Noise reduction Inverse halftoning Texture removal
  14. 14. Detecting migrating birds [CVPR16] Object tracking [ICCV15] Multi-face tracking [ECCV16] Visual Tracking - Locating moving objects across video frames
  15. 15. Weakly supervised localization [CVPR16] k-NN GraphUnlabeled images Negative miningPositive mining Unsupervised feature learning [ECCV16] Learning with weak supervision
  16. 16. What is Computer Vision? • Make computers understand images and videos. • Where is this? • What are they doing? • Why is this happening? • What is important? • What will I see?
  17. 17. Computer Vision and Nearby Fields Images (2D) Geometry (3D) Shape Photometry Appearance Digital Image Processing Computational Photography Computer Graphics Computer Vision Machine learning: Vision = Machine learning applied to visual data
  18. 18. Visual data on the Internet • Flickr • 10+ billion photographs • 60 million images uploaded a month • Facebook • 250 billion+ • 300 million a day • Instagram • 55 million a day • YouTube • 100 hours uploaded every minute 90% of net traffic will be visual! Mostly about cats
  19. 19. Too big for humans • Need automatic tools to access and analyze visual data! http://www.petittube.com/
  20. 20. Slide credit: Alexei A. Efros
  21. 21. Vision is Really Hard • Vision is an amazing feature of natural intelligence • Visual cortex occupies about 50% of Macaque brain • More human brain devoted to vision than anything else Is that a queen or a bishop?
  22. 22. Why is Computer Vision Hard?
  23. 23. Why is Computer Vision Hard?
  24. 24. What did you see? • Where this picture was taken? • How many people are there? • What are they doing? • What object the person on the left standing on? • Why this is a funny picture?
  25. 25. Why is Computer Vision Hard?
  26. 26. Why is Computer Vision Hard?
  27. 27. Why is Computer Vision Hard?
  28. 28. Why is Computer Vision Hard?
  29. 29. Why is Computer Vision Hard?
  30. 30. Why is Computer Vision Hard?
  31. 31. Computer: okay, it’s a funny picture
  32. 32. Safety Health Security Comfort AccessFun Computer Vision Technology Can Better Our Lives
  33. 33. History of Computer Vision Marvin Minsky, MIT Turing award, 1969 “In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a camera to a computer and get the machine to describe what it sees.” Crevier 1993, pg. 88
  34. 34. Half a century later, we're still working on it.
  35. 35. History of Computer Vision Marvin Minsky, MIT Turing award, 1969 Gerald Sussman, MIT “You’ll notice that Sussman never worked in vision again!” – Berthold Horn
  36. 36. 1960’s: interpretation of synthetic worlds Larry Roberts “Father of Computer Vision” Larry Roberts PhD Thesis, MIT, 1963, Machine Perception of Three-Dimensional Solids Input image 2x2 gradient operator computed 3D model rendered from new viewpoint Slide credit: Steve Seitz
  37. 37. 1970’s: some progress on interpreting selected images The representation and matching of pictorial structures Fischler and Elschlager, 1973
  38. 38. 1970’s: some progress on interpreting selected images The representation and matching of pictorial structures Fischler and Elschlager, 1973
  39. 39. 1980’s: ANNs come and go; shift toward geometry and increased mathematical rigor Image credit: Rick Szeliski
  40. 40. Goodbye science
  41. 41. 1990’s: face recognition; statistical analysis in vogue Image credit: Rick Szeliski
  42. 42. 2000’s: broader recognition; large annotated datasets available; video processing starts Image credit: Rick Szeliski
  43. 43. 2010’s: resurgence of deep learning [AlexNet NIPS 2012] [DeepFace CVPR 2014] [DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]
  44. 44. 2020’s: autonomous vehicles
  45. 45. 2030’s: robot uprising?
  46. 46. 3-mins break
  47. 47. Examples of Computer Vision Applications • How is computer vision used today?
  48. 48. Face detection • Most digital cameras and smart phones detect faces (and more) • Canon, Sony, Fuji, … • For smart focus, exposure compensation, and cropping Slide credit: Steve Seitz
  49. 49. Face recognition Facebook face auto-tagging
  50. 50. Application: building a boss detector http://ahogrammer.com/2016/11/15/deep-learning-enables-you-to-hide-screen- when-your-boss-is-approaching/
  51. 51. Face Landmark Alignment TAAZ.com: Virtual makeover Face morphing Jia-Bin Huang Miss Daegu 2013 Contestants Face Morphing
  52. 52. Face Landmark Alignment- Pose Estimation Pitch: ~ 15 degree Yaw: ~ +20, -20 degree Jia-Bin Huang What's the Best Pose for a Selfie?
  53. 53. Face Landmark Alignment- Augmented Reality SnapChat Lens
  54. 54. Face Landmark Alignment- Reenactment Face2Face CVPR 2016
  55. 55. Video-based face replacement
  56. 56. Hand tracking HandPose, CHI 2015
  57. 57. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
  58. 58. Smile Detection Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
  59. 59. Eye contact detection Gaze locking, UIST 2013
  60. 60. Vision-based Biometrics “How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia Slide credit: Steve Seitz
  61. 61. Vision-based Biometrics
  62. 62. Optical Character Recognition (OCR) Digit recognition, AT&T labs http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition • Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software Slide credit: Steve Seitz
  63. 63. Computer vision in sports Hawk-Eye: helping/improving referee decisions
  64. 64. Computer vision in sports SportVision: improving viewer experiences
  65. 65. Computer vision in sports Replay Technologies: improving viewer experiences
  66. 66. Computer vision in sports Play tracking
  67. 67. Computer vision in sports Second Spectrum: visual analytics
  68. 68. Visual recognition for photo organization Google photo
  69. 69. Matching street clothing photos in online Shops Street2shop, ICCV 2015
  70. 70. 3D scanner from a mobile camera MobileFusion
  71. 71. Earth viewers (3D modeling) Image from Microsoft’s Virtual Earth (see also: Google Earth) Slide credit: Steve Seitz
  72. 72. 3D from thousands of images Building Rome in a Day: Agarwal et al. 2009
  73. 73. 3D from thousands of images [Furukawa et al. CVPR 2010]
  74. 74. Microsoft PhotoSynth: Photo Tourism
  75. 75. MS PhotoSynth in CSI
  76. 76. Indoor Scene Reconstruction Structured Indoor Modeling, ICCV2015
  77. 77. Tour into picture Adobe Affer Effect
  78. 78. First-person Hyperlapse Videos [Kopf et al. SIGGRAPH 2014]
  79. 79. 3D Time-lapse from Internet Photos 3D Time-lapse from Internet Photos, ICCV 2015
  80. 80. Special effects: matting and composition Kylie Minogue - Come Into My World
  81. 81. Style transfer A Neural Algorithm of Artistic Style [Gatys et al. 2015] Target image (Content)Source image (Style) Output (deepart)
  82. 82. Prisma – Best app of the year
  83. 83. EverFilter
  84. 84. The Matrix movies, ESC Entertainment, XYZRGB, NRC Special effects: shape capture Slide credit: Steve Seitz
  85. 85. Pirates of the Carribean, Industrial Light and Magic Special effects: motion capture Slide credit: Steve Seitz
  86. 86. Google cars Google in talks with Ford, Toyota and Volkswagen to realise driverless cars http://www.theatlantic.com/technology/archive/2014/05/all-the-world-a-track- the-trick-that-makes-googles-self-driving-cars-work/370871/
  87. 87. Interactive Games: Kinect • Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o • Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg • 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A • Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
  88. 88. Vision in space Vision systems (JPL) used for several tasks • Panorama stitching • 3D terrain modeling • Obstacle detection, position tracking • For more, read “Computer Vision on Mars” by Matthies et al. NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007.
  89. 89. Industrial robots Vision-guided robots position nut runners on wheels http://www.automationworld.com/computer-vision-opportunity-or-threat
  90. 90. Mobile robots http://www.robocup.org/NASA’s Mars Spirit Rover Saxena et al. 2008 STAIR at Stanford http://www.youtube.com/w atch?v=DF39Ygp53mQ
  91. 91. Medical imaging Image guided surgery Grimson et al., MIT3D imaging MRI, CT
  92. 92. Computer vision for healthcare BiliCam, Ubicomp 2014
  93. 93. Computer vision for healthcare Autism Screening
  94. 94. Computer vision for healthcare Video magnification
  95. 95. Computer vision for the mass Counting cells Predicting poverty
  96. 96. Current state of the art • Many of these are less than 5 years old • Very active and exciting research area! • To learn more about vision applications and companies – David Lowe maintains an excellent overview of vision companies • http://www.cs.ubc.ca/spider/lowe/vision.html
  97. 97. 20-mins coffee break
  98. 98. Fundamentals of Computer Vision • Light – What an image records • Matching – How to measure the similarity of two regions • Alignment – How to recover transformation parameters based on matched points • Geometry – How to relate world coordinates and image coordinates • Grouping – What points/regions/lines belong together? • Recognition – What similarities are important? Slide credit: Derek Hoiem
  99. 99. Light –What an image records
  100. 100. Image Formation
  101. 101. Digital camera A digital camera replaces film with a sensor array • Each cell in the array is light-sensitive diode that converts photons to electrons • Two common types: • Charge Coupled Device (CCD): larger yet slower, better quality • Complementary Metal Oxide Semiconductor (CMOS): high bandwidth, lower quality • http://electronics.howstuffworks.com/digital-camera.htm Slide by Steve Seitz
  102. 102. Sensor Array CMOS sensor CCD sensor
  103. 103. 0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99 0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91 0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92 0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95 0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85 0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33 0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74 0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93 0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99 0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97 0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
  104. 104. Perception of Intensity from Ted Adelson • A is brighter? • B is brighter? • They are equally bright
  105. 105. Perception of Intensity from Ted Adelson
  106. 106. Digital Color Images
  107. 107. Color Image R G B
  108. 108. Image filtering • Linear filtering: function is a weighted sum/difference of pixel values • Really important! • Enhance images • Denoise, smooth, increase contrast, etc. • Extract information from images • Texture, edges, distinctive points, etc. • Detect patterns • Template matching
  109. 109. 111 111 111 Slide credit: David Lowe (UBC) ],[g  Example: box filter
  110. 110. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk   [.,.]h[.,.]f Image filtering 111 111 111 ],[g 
  111. 111. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  112. 112. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  113. 113. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  114. 114. 0 10 20 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  115. 115. 0 10 20 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ? ],[],[],[ , lnkmflkgnmh lk  
  116. 116. 0 10 20 30 30 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ? ],[],[],[ , lnkmflkgnmh lk  
  117. 117. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10 0 20 40 60 60 60 40 20 0 30 60 90 90 90 60 30 0 30 50 80 80 90 60 30 0 30 50 80 80 90 60 30 0 20 30 50 50 60 40 20 10 20 30 30 30 30 20 10 10 10 10 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  118. 118. What does it do? • Replaces each pixel with an average of its neighborhood • Achieve smoothing effect (remove sharp features) 111 111 111 Slide credit: David Lowe (UBC) ],[g  Box Filter
  119. 119. Smoothing with box filter
  120. 120. Practice with linear filters 000 010 000 Original ? Source: D. Lowe
  121. 121. Practice with linear filters 000 010 000 Original Filtered (no change) Source: D. Lowe
  122. 122. Practice with linear filters 000 100 000 Original ? Source: D. Lowe
  123. 123. Practice with linear filters 000 100 000 Original Shifted left By 1 pixel Source: D. Lowe
  124. 124. Practice with linear filters Original 111 111 111 000 020 000 - ? (Note that filter sums to 1) Source: D. Lowe
  125. 125. Practice with linear filters Original 111 111 111 000 020 000 - Sharpening filter - Accentuates differences with local average Source: D. Lowe
  126. 126. Sharpening Source: D. Lowe
  127. 127. Other filters -101 -202 -101 Vertical Edge (absolute value) Sobel
  128. 128. Other filters -1-2-1 000 121 Horizontal Edge (absolute value) Sobel
  129. 129. Basic gradient filters 000 10-1 000 010 000 0-10 10-1 or Horizontal Gradient Vertical Gradient 1 0 -1 or
  130. 130. Demo - image filtering
  131. 131. Matching How to measure the similarity of two regions Alignment How to recover transformation parameters based on matched points
  132. 132. Correspondence and alignment • Correspondence: matching points, patches, edges, or regions across images ≈
  133. 133. Correspondence and alignment • Alignment: solving the transformation that makes two things match better
  134. 134. A Example: fitting an 2D shape template Slide from Silvio Savarese
  135. 135. Example: fitting a 3D object model Slide from Silvio Savarese
  136. 136. Example: estimating “fundamental matrix” that corresponds two views Slide from Silvio Savarese
  137. 137. Example: tracking points frame 0 frame 22 frame 49 x x x
  138. 138. Overview of Keypoint Matching K. Grauman, B. Leibe Af Bf A1 A2 A3 Tffd BA ),( 1. Find a set of distinctive key-points 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors
  139. 139. Goals for Keypoints Detect points that are repeatable and distinctive
  140. 140. Key trade-offs More Repeatable More Points A1 A2 A3 Detection More Distinctive More Flexible Description Robust to occlusion Works with less texture Minimize wrong matches Robust to expected variations Maximize correct matches Robust detection Precise localization
  141. 141. Choosing interest points Where would you tell your friend to meet you?
  142. 142. Choosing interest points Where would you tell your friend to meet you?
  143. 143. Local Descriptors: SIFT Descriptor [Lowe, ICCV 1999] Histogram of oriented gradients • Captures important texture information • Robust to small translations / affine deformations K. Grauman, B. Leibe
  144. 144. Examples of recognized objects
  145. 145. Demo – interest point detection
  146. 146. Geometry How to relate world coordinates and image coordinates
  147. 147. Single-view Geometry How tall is this woman? Which ball is closer? How high is the camera? What is the camera rotation? What is the focal length of the camera?
  148. 148. Single-view Geometry Mapping between image and world coordinates • Pinhole camera model • Projective geometry • Vanishing points and lines
  149. 149. Image formation Let’s design a camera – Idea 1: put a piece of film in front of an object – Do we get a reasonable image? Slide credit: Steve Seitz
  150. 150. Pinhole camera Idea 2: add a barrier to block off most of the rays • This reduces blurring • The opening known as the aperture Slide credit: Steve Seitz
  151. 151. Pinhole camera Figure from David Forsyth f f = focal length c = center of the camera c
  152. 152. Figures © Stephen E. Palmer, 2002 Dimensionality Reduction Machine (3D to 2D) Point of observation 3D world 2D image
  153. 153. Projection can be tricky… Slide source: Seitz
  154. 154. Projection can be tricky… Slide source: Seitz Making of 3D sidewalk art: http://www.youtube.com/watch?v=3SNYtd0Ayt0
  155. 155. Vanishing points and lines Parallel lines in the world intersect in the image at a “vanishing point”
  156. 156. Vanishing points and lines Vanishing point Vanishing line Vanishing point Vertical vanishing point (at infinity) Slide from Efros, Photo from Criminisi
  157. 157. Comparing heights Vanishing Point Slide by Steve Seitz 163
  158. 158. Measuring height 1 2 3 4 5 5.3 2.8 3.3 Camera height Slide by Steve Seitz 164
  159. 159. Image Stitching • Combine two or more overlapping images to make one larger image Add example Slide credit: Vaibhav Vaish
  160. 160. Illustration Camera Center
  161. 161. RANSAC for Homography Initial Matched Points
  162. 162. Final Matched Points RANSAC for Homography
  163. 163. RANSAC for Homography
  164. 164. Bundle Adjustment • New images initialised with rotation, focal length of best matching image
  165. 165. Bundle Adjustment • New images initialised with rotation, focal length of best matching image
  166. 166. Details to make it look good • Choosing seams • Blending
  167. 167. Choosing seams • Better method: dynamic program to find seam along well-matched regions Illustration: http://en.wikipedia.org/wiki/File:Rochester_NY.jpg
  168. 168. Gain compensation • Simple gain adjustment • Compute average RGB intensity of each image in overlapping region • Normalize intensities by ratio of averages
  169. 169. Multi-band Blending • Burt & Adelson 1983 • Blend frequency bands over range  l
  170. 170. Blending comparison (IJCV 2007)
  171. 171. Depth from Stereo • Goal: recover depth by finding image coordinate x’ that corresponds to x f x x’ Baseline B z C C’ X f X x x'
  172. 172. Correspondence Problem • We have two images taken from cameras with different intrinsic and extrinsic parameters • How do we match a point in the first image to a point in the second? How can we constrain our search? x ?
  173. 173. Potential matches for x have to lie on the corresponding line l’. Potential matches for x’ have to lie on the corresponding line l. Key idea: Epipolar constraint x x’ X x’ X x’ X
  174. 174. Basic stereo matching algorithm •If necessary, rectify the two stereo images to transform epipolar lines into scanlines •For each pixel x in the first image • Find corresponding epipolar scanline in the right image • Search the scanline and pick the best match x’ • Compute disparity x-x’ and set depth(x) = fB/(x-x’)
  175. 175. 800 most confident matches among 10,000+ local features.
  176. 176. Epipolar lines
  177. 177. Keep only the matches at are “inliers” with respect to the “best” fundamental matrix
  178. 178. The Fundamental Matrix Song
  179. 179. 5-mins break
  180. 180. Recognition What similarities are important?
  181. 181. What do you see in this image? Can I put stuff in it? Forest Trees Bear Man Rabbit Grass Camera
  182. 182. Describe, predict, or interact with the object based on visual cues Is it alive? Is it dangerous? How fast does it run? Is it soft? Does it have a tail? Can I poke with it?
  183. 183. Why do we care about categories? • From an object’s category, we can make predictions about its behavior in the future, beyond of what is immediately perceived. • Pointers to knowledge • Help to understand individual cases not previously encountered • Communication
  184. 184. Image categorization: Fine-grained recognition Visipedia Project
  185. 185. Place recognition Places Database [Zhou et al. NIPS 2014]
  186. 186. Image style recognition [Karayev et al. BMVC 2014]
  187. 187. Training phase Training Labels Training Images Classifier Training Training Image Features Trained Classifier
  188. 188. Testing phase Training Labels Training Images Classifier Training Training Image Features Image Features Testing Test Image Trained Classifier Outdoor PredictionTrained Classifier
  189. 189. Q: What are good features for… • recognizing a beach?
  190. 190. Q: What are good features for… • recognizing cloth fabric?
  191. 191. Q: What are good features for… • recognizing a mug?
  192. 192. What are the right features? Depend on what you want to know! • Object: shape • Local shape info, shading, shadows, texture • Scene : geometric layout • linear perspective, gradients, line segments • Material properties: albedo, feel, hardness • Color, texture • Action: motion • Optical flow, tracked points
  193. 193. • Image features: map images to feature space • Classifiers: map feature space to label space x x x x x x x x o o o o o x2 x1 x xx x o ooo o o o x x x x x x x x o o o o o x2 x1 x xx x o ooo o o o x x x x x x x x o o o o o x2 x1 x xx x o ooo o o o
  194. 194. “Shallow” vs. “deep” architectures Hand-designed feature extraction Trainable classifier Image/ Video Pixels Object Class Layer 1 Layer N Simple classifier Object Class Image/ Video Pixels Traditional recognition: “Shallow” architecture Deep learning: “Deep” architecture …
  195. 195. Progress from Supervised Learning 2012 AlexNet 2013 ZF 2014 VGG 2014 GoogLeNet 2015 ResNet 2016 GoogLeNet-v4 15 10 5 ImageNet Image Classification Top5 Error
  196. 196. Object Detection Search by Sliding Window Detector • May work well for rigid objects • Key idea: simple alignment for simple deformations Object or Background?
  197. 197. Object Detection Search by Parts-based model • Key idea: more flexible alignment for articulated objects • Defined by models of part appearance, geometry or spatial layout, and search algorithm
  198. 198. Vision as part of an intelligent system 3D Scene Feature Extraction Interpretation Action Texture Color Optical Flow Stereo Disparity Grouping Surfaces Bits of objects Sense of depth Objects Agents and goals Shapes and properties Open paths Words Walk, touch, contemplate, smile, evade, read on, pick up, … Motion patterns
  199. 199. Well-Established (patch matching) Major Progress (pattern matching++) New Opportunities (interpretation/tasks) Multi-view Geometry Face Detection/Recognition Object Tracking / Flow Category Detection Human Pose 3D Scene Layout Vision for Robots Life-long Learning Entailment/Prediction Slide credit: Derek Hoiem
  200. 200. Scene Understanding = Objects + People + Layout + Interpretation within Task Context
  201. 201. What do I see?  Why is this happening? What is important? What will I see? How can we learn about the world through vision? How do we create/evaluate vision systems that adapt to useful tasks?
  202. 202. Important open problems • How can we interpret vision given structured plans of a scene?
  203. 203. Important open problems • Algorithms: works pretty well  perfect • E.g., stereo: top of wish list from Pixar guy Micheal Kass • Good directions: • Incorporate higher level knowledge
  204. 204. Important open problems • Spatial understanding: what is it doing? Or how do I do it? Important questions: • What are good representations of space for navigation and interaction? What kind of details are important? • How can we combine single-image cues with multi-view cues?
  205. 205. Important open problems Object representation: what is it? Important questions: • How can we pose recognition so that it lets us deal with new objects? • What do we want to predict or infer, and to what extent does that rely on categorization? • How do we transfer knowledge of one type of object to another?
  206. 206. Important problems • Learning features and intermediate representations that transfer to new tasks Figure from http://www.datarobot.com/blog/a-primer-on-deep-learning/
  207. 207. Important open problems • Almost everything is still unsolved! • Robust 3D shape from multiple images • Recognize objects (only faces and maybe cars is really solved, thanks to tons of data) • Caption images/video • Predict intention • Object segmentation • Count objects in an image • Estimate pose • Recognize actions • ….
  208. 208. This course has provided fundamentals • What is computer vision? • Examples of computer vision applications • Basic principles of computer vision: • Image formation • Filtering • Correspondence and alignment • Geometry • Recognition.
  209. 209. How do you learn more? Explore!
  210. 210. Resources – where to learn more • Awesome Computer Vision, Awesome Deep Vision • A curated list of awesome computer vision resources • Computer Vision Taiwanese Group • > 3,000 members in facebook • Videolectures and ML&CV Talks • Lectures, Keynotes, Panel Discussions • OpenCV – C++/Python/Java
  211. 211. Thank You! Image: www.clubtuzki.com

×