視訊訊號處理與深度學習應用

http://www.skywatch24.com
WELCOME
| Wei-Chao Chen
Co-Founder, Skywatch Inc. | Adjunct Faculty, NationalTaiwan University
weichao.chen@skywatch24.com

1998 2002 2006 2010 2016
ABOUT MYSELF

About Skywatch
… well, let’s just focus on the matters at hand, shall we?

TOPICSTODAY
•ComputerVision Fundamentals
•Content-Based Image Retrieval
•Visual Recognition
•Graphical Processing Units
•Hands-onTutorial (OpenCV)

DISCLAIMER
•I have
•published academically in these fields
•contributed code / algorithms
•built systems to related to these fields
•But I do not consider myself an expert in many
of these fields / topics
Source: http://www.inspirationde.com/

Model
Semantic
Context
Images &Video
KNOWLEDGE MAP
Computer
Vision
Computer
Graphics
Content-Based
Retrieval
Visual
Recognition
Numerical
Computation
Computational
Photography

ROBOTS
Wall-E, Pixar Inc.
OK, HIT IT!

Source: apple.com
Images andVideos

Microsoft Photosynth
Source: https://www.youtube.com/watch?v=4LxlhoemR3A

Video Reenactment
https://www.youtube.com/watch?v=ohmajJTcpNk

https://www.youtube.com/watch?v=jfSnc0yNvm8
Autonomous Cars - NVIDIA Drive PX2

Source: https://www.youtube.com/watch?v=gUlKtqyUlo8
Grandma rides aTesla

People Count
https://www.youtube.com/watch?v=TPjmsha6gWY
Loitering
https://www.youtube.com/watch?v=f5DxUe5I8as
Trip Wire
https://www.youtube.com/watch?v=MH2HZi6-4jA
Speed Test
https://www.youtube.com/watch?v=KV6CXdzmhNs

SMART DAILY
•Intelligent video summary
•Daily email on your past day’s activity
- Compressed domain
analysis
- Very fast - totally I/O
bound
- Detect events based
on activity / sensible
motion

Structure from motion 3d Reconstruction
Person tracking Face detection

RELATIONSHIPTO
DATA SCIENCE?
•Rich information, lots of data (in terms of bits)
•Unstructured, usually without much context / semantics
•Difﬁcult to process and query
•We are generating them every day

A BRIEF HISTORY
Many slides modiﬁed from: JB Huang / Larry Zitnick

History of Computer Vision
Marvin Minsky, MIT
Turing award, 1969
“In 1966, Minsky hired a first-year
undergraduate student and
assigned him a problem to solve
over the summer:
connect a camera to a computer
and get the machine to describe
what it sees.”
Crevier 1993, pg. 88

Half a century later,
we're still working on it.

1960’s: interpretation of synthetic worlds
Larry Roberts
“Father of Computer Vision”
Larry Roberts PhD Thesis, MIT, 1963,
Machine Perception of Three-
Dimensional Solids
Input image 2x2 gradient operator computed 3D model
rendered from new
viewpoint
Slide credit: Steve Seitz

1970’s: some progress on interpreting
selected images
The representation and matching of pictorial
structures
Fischler and Elschlager, 1973

1980’s
AI winter…
A Computational Approach to Edge Detection, Canny 1986
…back to basics

Perceptual Organization and Visual Recognition, David
Lowe, 1984
1984

Perceptual organization and the representation of natural form, Alex
Pentland, 1986
1986

Image credit: Rick Szeliski
Blending Shape from shading Edge detection
Physically based models Range data acquisition
Research from the 80s

1990’s: structure, segmentation, and face
recognition

2000’s: more object classes, computational
photography, video processing

2010’s: deep learning is back!
[AlexNet NIPS 2012] [DeepFace CVPR 2014]
[DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]

ACKNOWLEDGEMENTS
•Many slides and materials borrowed from Jia-Bin Huang, Steve Seitz,
Rich Szeliski,Andrew Zisserman, Larry Zitnick. I try to give credits
whenever possible but may have a few omissions.
•Rights of illustration, pictures and other relevant materials belong to
their original creators or authors.

PART I: COMPUTERVISION
FUNDAMENTALS
| Wei-Chao Chen

TOPICS
•Image formation and 2D image processing
•Epipolar geometry and stereo matching
•Structure from motion and tracking
•Stitching and computational photography
•Visual recognition (next talk)

REFERENCE BOOK
“MultipleView Geometry in Computer
Vision”, Richard Hartley and Andrew
Zisserman
•A good book to get started on camera
geometry
•More math heavy but very old school

REFERENCE BOOK
“ComputerVision:Algorithms and
Applications”, Richard Szeliski
•More balanced mix between math and
application
•Freely available online
http://szeliski.org/Book/

IMAGE FORMATION &
PROCESSING
Many slides adapted from: R. Szeliski, D. Lowe

Image formation
Let’s design a camera
• Idea 1: put a piece of film in front of an object
• Do we get a reasonable image?

Pinhole camera
Add a barrier to block off most of the rays
• This reduces blurring
• The opening known as the aperture
• How does this transform the image?

Shrinking the aperture
Why not make the aperture as small as possible?
• Less light gets through
• Diffraction effects...

Adding a lens
A lens focuses light onto the film
• There is a specific distance at which objects are “in focus”
– other points project to a “circle of confusion” in the image
• Changing the shape of the lens changes this distance
“circle of
confusion”

Depth of field
Changing the aperture size affects depth of field
• A smaller aperture increases the range in which the object is
approximately in focus
f / 5.6
f / 32
Flower images from Wikipedia http://en.wikipedia.org/wiki/Depth_of_field

Modeling projection
The coordinate system
• We will use the pin-hole model as an approximation
• Put the optical center (Center Of Projection) at the origin
• Put the image plane (Projection Plane) in front of the COP
– Why?
• The camera looks down the negative z axis
– we need this if we want right-handed-coordinates

Modeling projection
Projection equations
• Compute intersection with PP of ray from (x,y,z) to COP
• Derived using similar triangles (on board)
• We get the projection by throwing out the last coordinate:

Homogeneous coordinates
Is this a linear transformation?
Trick: add one more coordinate:
homogeneous image
coordinates
homogeneous scene
coordinates
Converting from homogeneous coordinates
• no—division by z is nonlinear

Perspective Projection
Projection is a matrix multiply using homogeneous coordinates:
divide by third coordinate
This is known as perspective projection
• The matrix is the projection matrix
• Can also formulate as a 4x4 (today’s reading does this)
divide by fourth coordinate

Projection equation
• The projection matrix models the cumulative effect of all parameters
• Useful to decompose into a series of operations
ΠXx =
⎥
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎢
⎣
⎡
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎣
⎡
=
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎣
⎡
=
1
****
****
****
Z
Y
X
s
sy
sx
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎣
⎡
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎣
⎡
−
−
=
11
0100
0010
0001
100
'0
'0
31
1333
31
1333
x
xx
x
xx
cy
cx
yfs
xfs
00
0 TIRΠ
projectionintrinsics rotation translation
identity matrix
Camera parameters
A camera is described by several parameters
• Translation T of the optical center from the origin of world coords
• Rotation R of the image plane
• focal length f, principle point (x’c, y’c), pixel size (sx, sy)
• blue parameters are called “extrinsics,” red are “intrinsics”
• The definitions of these parameters are not completely standardized
– especially intrinsics—varies from one book to another

VISUALIZE CAMERA
PARAMETERS
http://ai.stanford.edu/~saumitro/projektiv/

Distortion
Radial distortion of the image
• Caused by imperfect lenses
• Deviations are most noticeable for rays that pass through the
edge of the lens
No distortion Pin cushion Barrel

Correcting radial distortion
from Helmut Dersch

Structure from Motion 20
Camera calibration
Determine camera parameters from known 3D
points or calibration object(s)
1. internal or intrinsic parameters such as focal
length, optical center, aspect ratio: 
what kind of camera?
2. external or extrinsic (pose) 
parameters: 
where is the camera?
How can we do this?
CSE 576, Spring 2008

Camera matrix
Fold intrinsic calibration matrix K and extrinsic
pose parameters (R,t) together into a 
camera matrix
M = K [R | t ]
(put 1 in lower r.h. corner for 11 d.o.f.)

Camera matrix calibration
Directly estimate 11 unknowns in the M matrix
using known 3D points (Xi,Yi,Zi) and measured
feature positions (ui,vi)

Separate intrinsics / extrinsics
New feature measurement equations
Use non-linear minimization
Standard technique in photogrammetry, computer
vision, computer graphics
• [Tsai 87] – also estimates κ1 (freeware @ CMU) 
http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html
• [Bogart 91] – View Correlation

Multi-plane calibration
Use several images of planar target held at
unknown orientations [Zhang 99]
• Compute plane homographies
• Solve for K-TK-1 from Hk’s
– 1plane if only f unknown
– 2 planes if (f,uc,vc) unknown
– 3+ planes for full K
• Code available from Zhang and OpenCV

Source: http://www.vision.caltech.edu/bouguetj/calib_doc/
Or use the OpenCV version

Input calibration images; make sure they are tilted at various angles

Click on corners of checkerboard to help extracting the corners

These are the extracted corners

These are the estimated extrinsic parameters (camera poses)

360 degree field of view…
Basic approach
• Take a photo of a parabolic mirror with an orthographic lens (Nayar)
• Or buy one a lens from a variety of omnicam manufacturers…
– See http://www.cis.upenn.edu/~kostas/omni.html

Tilt-shift
Titlt-shift images from Olivo Barbieri 
and Photoshop imitations
http://www.northlight-images.co.uk/article_pages/tilt_and_shift_ts-e.html

Ansel Adams,
Ref: https://en.wikipedia.org/wiki/Scheimpﬂug_principle

Color Image R
G
B
Note: many early algorithms use greyscale instead of
color images. Why?

2D Image filtering
• Linear filtering is a weighted sum/difference of pixel values
• Enhance images
• Denoise, smooth, increase contrast, etc.
• Extract information from images
• Texture, edges, distinctive points, etc.
• Detect patterns
• Template matching

111
111
111
Slide credit: David Lowe (UBC)
],[g ⋅⋅
Example: box filter

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅
Credit: S. Seitz
?
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

0 10 20 30 30
50
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g ⋅⋅
Credit: S. Seitz
?
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30 30 30 20 10
0 20 40 60 60 60 40 20
0 30 60 90 90 90 60 30
0 30 50 80 80 90 60 30
0 30 50 80 80 90 60 30
0 20 30 50 50 60 40 20
10 20 30 30 30 30 20 10
10 10 10 0 0 0 0 0
[.,.]h[.,.]f
Image filtering 111
111
111
],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑

What does it do?
• Replaces each pixel
with an average of its
neighborhood
• Achieve smoothing
effect (remove sharp
features)
111
111
111
Slide credit: David Lowe (UBC)
],[g ⋅⋅
Box Filter

Practice with linear filters
000
010
000
Original
?
Source: D. Lowe

000
010
000
Original Filtered
(no change)
Source: D. Lowe

000
100
000
Original
?
Source: D. Lowe

000
100
000
Original Shifted left
By 1 pixel
Source: D. Lowe

Original
111
111
111
000
020
000
- ?
(Note that filter sums to 1)
Source: D. Lowe

Original
111
111
111
000
020
000
-
Sharpening filter
- Accentuates differences with
local average
Source: D. Lowe

Other filters
-101
-202
-101
Vertical Edge
(absolute value)
Sobel

Other filters
-1-2-1
000
121
Horizontal Edge
(absolute value)
Sobel

EPIPOLAR GEOMETRY &
STEREO ALGORITHMS
Many slides adapted from: R. Szeliski and S. Savarese

The “Vertigo" Effect
https://www.youtube.com/watch?v=sKJeTaIEldM

Fundamental Matrix Visualization
Source: MobileVisual Search @ NRC Palo Alto

Stereo matching 76
Rectification
Project each image onto same plane, which is parallel to
the epipole
Resample lines (and shear/stretch) to place lines in
correspondence, and minimize distortion 
 
 
 
 
 
[Loop and Zhang, CVPR’99]

Stereo matching 77
Rectification
BAD!

Stereo matching 78
Rectification
GOOD!

Stereo matching 79
Your basic stereo algorithm
For each epipolar line
For each pixel in the left image
• compare with every pixel on same epipolar line in right image
• pick pixel with minimum match cost
Improvement: match windows
• This should look familar...

Stereo matching 80
Depth Map Results
 
 
Input image Sum Abs Diff 
Mean field Graph cuts

Stereo matching 81
Active stereo with structured light
Project “structured” light patterns onto the object
• simplifies the correspondence problem
camera 2
camera 1
projector
camera 1
projector
Li Zhang’s one-shot stereo

Data Acquisition
• Our custom-built 3D
Scanner:
• 200-400 image captured with
hand-held camera
• Geometry scanned with
structured-light
• Images registered to geometry
• Precise & inexpensive
Chen et al., Light Field Maping, SIGGRAPH 2002

STRUCTURE FROM MOTION &
TRACKING
Many slides adapted from: S. Seitz / R. Szeliski

Finding Paths through the World’s Photos
(Photo Tourism / Photosynth)
84 Snavely et al., Finding Paths through the World's Photos, SIGGRAPH 2008

Pose estimation
Once the internal camera parameters are known,
can compute camera pose
[Tsai87] [Bogart91]
Application: superimpose 3D graphics onto video
How do we initialize (R,t)?

Structure from motion
Given many points in correspondence across
several images, {(uij,vij)}, simultaneously compute
the 3D location xi and camera (or motion)
parameters (K, Rj, tj)
Two main variants: calibrated, and uncalibrated
(sometimes associated with Euclidean and
projective reconstructions)

Structure from motion
How many points do we need to match?
• 2 frames:
(R,t): 5 dof + 3n point locations ≤
4n point measurements ⇒
n ≥ 5
• k frames:
6(k–1)-1 + 3n ≤ 2kn
• always want to use many more

Structure [from] Motion
Given a set of feature tracks, 
estimate the 3D structure and 3D (camera)
motion.
Assumption: orthographic projection
[Tomasi & Kanade, IJCV 92]

Wei-Chao Chen
Tracking of Robust Features (SURFTrac)
weichao.chen@gmail.com89 * Ta, Chen, Gelfand, Pulli, IEEE CVPR 2009 (oral)
} Scale-space tracking + matching of robust features
} ~10 FPS on a N95 mobile phone (compared to 0.5 FPS)

Wei-Chao Chen
Tracking of Robust Features (SURFTrac)
weichao.chen@gmail.com90 * Ta, Chen, Gelfand, Pulli, IEEE CVPR 2009 (oral)

STITCHING &
COMPUTATIONAL
PHOTOGRAPHY
Many slides adapted from: S. Seitz / R. Szeliski

Image Stitching 92
Image Mosaics
 
+ + … + =
Goal: Stitch together several images into a
seamless composite
Richard Szeliski

Image Stitching 93
Panoramas
What if you want a 360° field of view?
mosaic Projection Cylinder
Richard Szeliski

Image Stitching 94
Recognizing Panoramas
Matthew Brown & David Lowe
ICCV’2003
Richard Szeliski

Image Stitching 95
Finding the panoramas
Richard Szeliski

Image Stitching 96
Richard Szeliski

Image Stitching 97
Richard Szeliski

Image Stitching 98
De-Ghosting
Richard Szeliski

Image Stitching 99
Cutout-based de-ghosting
•Select only one image per
output pixel, using spatial
continuity
•Blend across seams using
gradient continuity
(“Poisson blending”) 
 
 
 
[Agarwala et al., SG’2004]
Richard Szeliski

Image Stitching 100
Cutout-based compositing
Photomontage [Agarwala et al., SG’2004]
• Interactively blend different images: 
group portraits
Richard Szeliski

Computational Photography 101
Poisson Image Editing
Patrick Pérez, Michel Gangnet, Andrew Blake
SIGGRAPH 2003
Richard Szeliski

Poisson Image Editing
Blend the gradients of the two images, then integrate
Richard Szeliski

Seamless Poisson cloning
Given vector field v (pasted gradient), find the
value of f in unknown region that optimizes:
Pasted gradient Mask
Background
unknown 
region
Richard Szeliski

Face cloning
Richard Szeliski

Texture swapping
Richard Szeliski

Wei-Chao Chen
Interactive Mobile Panorama
weichao.chen@gmail.com106
} Automatic capture based
on camera motion tracking
(2D)
} On-site interactive
evaluation of panorama
result
} High resolution images for
panorama stitching
Mobile Augmented Reality research at NRC Palo Alto

Wei-Chao Chen
Interactive Mobile Panorama
weichao.chen@gmail.com107 Mobile Augmented Reality research at NRC Palo Alto

Wei-Chao Chen
Interactive
Poisson blending to generate light and
color globally for the whole panorama
image
Poisson blending to keep details
and avoid blur
Poisson Blending Linear Blending
Original ImagesPoisson Blending Results
Poisson blending to merge images with
very different lighting and color
Mobile Augmented Reality research at NRC Palo Alto

High Dynamic Range Imaging  
(HDR)
slides borrowed from
15-463: Computational Photography
Alexei Efros, CMU, Fall 2007,
Paul Debevec, and my talks
Richard Szeliski

Problem: Dynamic Range
1500
1
25,000
400,000
2,000,000,000
The real world is 
high dynamic range.
Richard Szeliski

Problem: Dynamic Range
Typical cameras have limited dynamic range
What can we do?
Solution: merge multiple exposures
Richard Szeliski

Varying Exposure
Richard Szeliski

HDR images — multiple inputs
Radiance
Pixel count
Richard Szeliski

HDR images — merged
Radiance
Pixel count
Richard Szeliski

Tone Mapping
10-6 106
10-6 106
Real World
Ray Traced
World (Radiance)
Display/
Printer
0 to 255
High dynamic range
How can we do this?
Linear scaling?, thresholding? Suggestions?
Richard Szeliski

Simple Global Operator
Compression curve needs to
• Bring everything within range
• Leave dark areas alone
In other words
• Asymptote at 255
• Derivative of 1 at 0
Richard Szeliski

Global Operator (Reinhart et al)
world
world
display
L
L
L
+
=
1
Richard Szeliski

Global Operator Results
Richard Szeliski

Darkest 0.1% scaled
to display device
Reinhart OperatorRichard Szeliski

Wei-Chao Chen
Artifact-free High Dynamic Range Imaging
weichao.chen@gmail.com120 * Gallo et al., IEEE ICCP 2009

Wei-Chao Chen weichao.chen@gmail.com121

Wei-Chao Chen
Artifact-free High Dynamic Range Imaging
weichao.chen@gmail.com122 * Gallo et al., IEEE ICCP 2009

Wei-Chao Chen weichao.chen@gmail.com123 * Gallo et al., IEEE ICCP 2009

Wei-Chao Chen weichao.chen@gmail.com124 * Gallo et al., IEEE ICCP 2009

Interactive Local Adjustment
of Tonal Values
Dani Lischinski
Zeev Farbman
The Hebrew University
Matt Uyttendaele
Rick Szeliski
Microsoft Research
SIGGRAPH 2006
Richard Szeliski

Tonal Manipulation
•brightness
•exposure
•contrast
•saturation
•color temperature
•…
Richard Szeliski

Interpretation 1:
Richard Szeliski

Interpretation 2:
Richard Szeliski

Interpretation 3:
Richard Szeliski

User interface
Richard Szeliski

Input: constraints
+0.5 f-stops
-1.0 f-stops
+2.0 f-stops
+1.2 f-stops
Richard Szeliski

Approximate constraints with a function whose
smoothness is determined by underlying image:
Our smoothness term:
Constraint Propagation
data term
smoothness
term
Richard Szeliski

Influence Functions
Richard Szeliski

Wei-Chao Chen
TouchTone: Point-and-Swipe Image Editing
} DEMO
weichao.chen@gmail.com136 * Liang, Chen, Gelfand. Eurographics 2010

ACKNOWLEDGEMENTS
•Many slides and materials borrowed from Jia-Bin Huang, Silvio
Savarese, Steve Seitz, Rich Szeliski,Andrew Zisserman, Larry Zitnick. I
try to give credits whenever possible but may have a few omissions.

P2.VISUAL RECOGNITION
AND QUERY
| Wei-Chao Chen

VISUAL RECOGNITION
Adapted from: StAR Lecture Series: Object Recognition, L. Zitnick (2014).
http://research.microsoft.com/apps/video/?id=231358

1989
Zip codes
MNIST
Backpropagation applied to handwritten zip code recognition, Lecun
et al., 1989

1989
Backpropagation applied to handwritten zip code recognition, Lecun
et al., 1989

1998
Neural Network-Based Face Detection, Rowley at al., PAMI 1998
Faces

2001
Rapid Object Detection using a Boosted Cascade of Simple Features, Viola
and Jones, CVPR 2001
Boosting + Cascade = Speed
Sliding window in real time!

Why did it work?
• Simple features (Haar wavelets)
Integral images + Haar wavelets = fast
- = h

1999*
Object Recognition from Local Scale-Invariant Features, Lowe,
ICCV 1999.
SIFT (Scale Invariant Feature
Transform)
No more sliding windows (interest points)
Better features (use more computation)

Less
blur
More
blur
Look for peaks!

Distinctive image features from scale-invariant keypoints, Lowe, IJCV 2004
Better descriptor:

Truncated normalization
0.2
Distinctive image features from scale-invariant keypoints, Lowe, IJCV 2004
Histogram of pooled gradients

“Object instance” recognition (flat, textured objects)
What worked

What worked
Panorama stitching
Recognizing panoramas, Brown and Lowe, ICCV 2003

2003 Constellation model (redux)
Object Class Recognition by Unsupervised Scale-Invariant Learning, Fergus et
al., CVPR 2003.

2003 Constellation model (redux)
Joint Gaussian density
The representation and matching of pictorial
structures, Fischler and Elschlager, 1973

Interest points used to find parts:
Smaller number of candidate parts allows for
more complex spatial models.

2005 HOG (histograms of oriented gradients)
Histograms of oriented gradients for human detection, Dalal
and Triggs, CVPR 2005.

Pedestrians
• Defined by their contours
• Cluttered backgrounds
• Significant variance in texture
Interest points won’t work…
…back to sliding window.

SIFT

Normalize locally not globally

Presence > Magnitude

NMS
Linear SVM
classifier
For every candidate bounding box
Compute HOG
features
Non-maximal
suppression

Why it worked
We can finally detect object boundaries
in a reliable manner!
Computers are fast enough.
Hard negative mining

2008 DPM (Deformable parts model)
Object Detection with Discriminatively Trained Part Based Model,
Felzenszwalb, Girshick, McAllester and Ramanan, PAMI, 2010

Star-structure
• Computationally efficient (distance transform)
Distance transforms of sampled functions, Felzenszwalb and Huttenlocher, Cornell
University CIS, Tech. Rep. 2004.

Why it worked
• Multiple components
• Deformable parts?
• Hard negative mining
• Good balance
"How important are 'Deformable Parts' in the Deformable Parts Model?“, Divvala,
Efros, and Hebert, Parts and Attributes Workshop, ECCV, 2012

HOG
SVM
Pooling
Image
DPM
Limited capacity
classifier
Low-level features
Something new?

2006
15-30 training images, up to ~70% accuracy.
Antonio Torralba

2007 PASCAL VOC
The PASCAL Visual Object Classes (VOC) Challenge, Everingham, Van
Gool, Williams, Winn and Zisserman, IJCV, 2010 
20 classes

2009 Caltech Pedestrian
Pedestrian Detection: An Evaluation of the State of the Art, Dollár,
Wojek, Schiele and Perona, PAMI, 2012
1 class, lots of instances.

2009 ImageNet
ImageNet: A Large-Scale Hierarchical Image Database, Deng,
Dong, Socher, Li, Li and Fei-Fei, CVPR, 2009
Corgi Orb weaving spider
22K categories, 14M images

2010 SUN
SUN Database: Large-scale Scene Recognition from Abbey to Zoo 
Xiao, Hays, Ehinger, Oliva, and Torralba, CVPR, 2010.
908 scene categories
Beer garden

MS COCO
• 70-100 object categories (things not stuff)
• 330,000 images (~150k first release)
• 2 million instances (400k people)
• Every instance is segmented
• 7.7 instances per image (3.5 categories)
• Key points
• 5 sentences per image
http://mscoco.org
Over 77,000 worker hours (8+ years)

Amazon Mechanical Turk
• Crowdsourcing tool to setup paying tasks for the crowd
• Perfect for image annotation jobs (need cleanup, etc)

2009 2012
Images
30K
14M
ImageNet

2009 2012
Categories
22K
256
ImageNet

2009
Algorithms
HOG
SVM
Pooling
Image
2012
Convolution
Convolution
Convolution
Image
Convolution
Convolution
Dense
Dense
Dense

Source: ImagNet Large Scale
Visual Recognition Challenge

Source: devblogs.nvidia.com
DNN results

2012 DNNs (deep neural networks)
Image
Output
90% parameters
Imagenet Classification with Deep Convolutional Neural Networks,
Krizhevsky, Sutskever, and Hinton, NIPS 2012

Imagenet Classification with Deep Convolutional Neural Networks,
Krizhevsky, Sutskever, and Hinton, NIPS 2012
Gradient-Based Learning Applied to Document Recognition, LeCun,
Bottou, Bengio and Haffner, Proc. of the IEEE, 1998
GPUs
+
Data*
* Rectified activations and dropout

Classification vs. Detection
✓ Dog
Dog
Dog

Aspect ratios?
DNNs are slow… (relatively)
224 x 224

Finding object candidates
Use low-level cues…
Segmentation As Selective Search for Object Recognition,
van de Sande, Uijlings, Gevers, Smeulders, ICCV 2011

1. Input image 2. Extract region
proposals (~2k)
3. Compute CNN
features
4. Classify regions
Online classification demo:
http://decaf.berkeleyvision.org/
Object detection
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,
Girshick, Donahue, Darrell, Malik, CVPR 2014.

Neural networks are easily
fooled
“Intriguing properties of neural networks”,
Szegedy, Zaremba, Sutskever, Bruna, Erhan, Goodfellow, Fergus, ICLR 2014.
Correct
Recognition
Incorrect
Recognition

Neural networks are easily
fooled
Nguyen A,Yosinski J, Clune J. Deep Neural Networks are Easily Fooled: High Conﬁdence
Predictions for Unrecognizable Images. In CVPR '15, IEEE, 2015

Why it fails…
PANDA: Pose Aligned Networks for Deep
Attribute Modeling, Zhang, Paluri, Ranzato,
Darrell, Bourdev, CVPR 2014.
DeepFace: closing the gap to human-Level
performance in face verification, Taigman, Yang,
Ranzato, Wolf, CVPR 2014.

Additional Challenges
• Detection in context (with common sense)
• Model awareness
• Training time (when dataset is incrementally updated)
• More science?

QUERY AND RETRIEVAL
Some slides adapted from: D. Nister, and MAR research at Nokia Research
Center, Palo Alto

Image Example
Images &Video
TYPES OF QUERY
Query String
Meta Data
recognition
feature extraction
hashing, etc
text based
search

Source: https://en.wikipedia.org/wiki/Google_bomb
Metadata / text search can be hacked

Image Example
Images &Video
FOCUS: SEARCH BY IMAGE
Query String
Meta Data
feature extraction
hashing, etc

SEARCH BY IMAGE EXAMPLES
•Still very much an open problem
•Personally, I won’t jump in and start a company yet
•Ignore me and dream big, if you wish
•Most commercial applications use a mixture of algorithms
•Similarity, face recognition, instance recognition, OCR/text

See the list of “things” it can search
to get an idea about how it is done.

Global scale of face recognition
Super hard!
Better luck in social networks

Similarity search
works visually, doesn’t work as a
consumer service product

Google Image search result (2016)
My guess: from image to meta data
(text), then reissue text-based
search.

INSTANCE RECOGNITION
•Works, but the biggest problem is
speed.
•A common approach is to use
bag-of-words approach used in
text search
•Replace ‘words’ with ‘image
features’
•Put them in a bag (search
structure)
https://gilscvblog.com/2013/08/23/bag-of-words-models-for-visual-categorization/

SEARCH STRUCTURE
•Words are one-dimensional
•Use binary tree
• Features are high-dimensional
•k-d tree slow at higher
dimensions (we have 256!)
source: wikipedia.org

POSSIBLE SOLUTIONS
•Find approximate words
•e.g. approximate nearest
neighbour (ANN)
•Find lower dimensional space to
split the data
•e.g. scalable vocabulary tree
https://www.cs.umd.edu/~mount/ANN/

Wei-Chao Chen
Stanford-Nokia  
Mobile Augmented Reality System
weichao.chen@gmail.com103 * Takacs et al., ACM MIR 2008

Scalable Recognition with a Vocabulary Tree 
David Nistér, Henrik Stewénius

Wei-Chao Chen
CD Cover Recognition
… …
Query
Identity
* Chen et al., IEEE VCIP 2009

Wei-Chao Chen127
Automatic Generation of Tourist Maps
From online
image collections
Automatic landmark
extraction
weichao.chen@gmail.com* Chen et al., ACM Multimedia (to appear)

Wei-Chao Chen
128
Automatic landmark
extraction
Image matching
& clustering

Wei-Chao Chen129
Image matching
& clustering
Icon generation &
Map placement

Wei-Chao Chen130
New Delhi, India & Rome, Italy

ACKNOWLEDGEMENTS
•Many slides and materials borrowed from Jia-Bin Huang, David Nister,
Steve Seitz, Rich Szeliski,Andrew Zisserman, Larry Zitnick. I try to give
credits whenever possible but may have a few omissions.

PART 3: GPU AND
COMPUTATION
| Wei-Chao Chen

PARALLEL PROCESSING & GPU
Many slides adapted from Wei-Chao Chen, for GPU
Programming course at NTU

Wei-Chao Chen (weichao.chen@gmail.com)
Parallel Computing Goals
4
} To solve your problem in less time
} Divide one big problem into smaller pieces
} Solve smaller problems concurrently
} Allows us to solve a bigger problem
} In order to parallelize a problem
} Identify dependencies in the problem
} Identify critical paths in the algorithm
} Modify dependencies to shorten the critical paths

Wei-Chao Chen (weichao.chen@gmail.com)5

Types of Parallelism
6
} Multiple programs
} Multi-tasking
} Multi-threading
} Single program
} Instruction-level parallelism
} Data parallelism

Instruction-Level Parallelism
7
} Multiple instructions in a serial program get executed
simultaneously
} Superscalar, etc
A=A+1
B=B+1
C=C+A
C=C+B
…
T=0

8
simultaneously
} Superscalar, etc
A=A+1
B=B+1
C=C+A
C=C+B
…
T=1
(failed to issue because of dependency)

9
simultaneously
} Superscalar, etc
A=A+1
B=B+1
C=C+A
C=C+B
…
T=2

Data-Level Parallelism - SIMD
10
} Single instruction, multiple data processing model
} Perfect for parallelizing a for-loop
for i=0:15,
a[i] = b[i] * c[i] + d[i]
end
a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] … a[15]
b[0] b[1] b[2] b[3] b[4] b[5] b[6] b[7] … b[15]
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] … c[15]
d[0] d[1] d[2] d[3] d[4] d[5] d[6] d[7] … d[15]

11
va = vb × vc + vd
➔ vt = vb × vc
…
b[0] b[1] b[2] b[3] b[4] b[5] b[6] b[7] … b[15]
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] … c[15]
d[0] d[1] d[2] d[3] d[4] d[5] d[6] d[7] … d[15]
P0 P1 P2 P3 P4 P5 P6 P7 … P15
T=0
Processors

12
va = vb × vc + vd
➔ vt = vb × vc
➔ va = vt + vd
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] … t[16]
b[0] b[1] b[2] b[3] b[4] b[5] b[6] b[7] … b[16]
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] … c[16]
d[0] d[1] d[2] d[3] d[4] d[5] d[6] d[7] … d[16]
P0 P1 P2 P3 P4 P5 P6 P7 … P15
T=1
Processors

Amdahl’s Law
13
} Named after computer architect Gene Amdahl
} Speedup of a parallel computer is limited by the amount of
serial work
Parallelizable workSerial
Serial 2x Processors
Serial 4x Processors
Serial Many Processors

Amdahl’s Law
14
Source: wikipedia.org

Resource Management
15
} Proper resource management is important!
} Deadlock - dining philosophers problem
} “Starvation”
} How to solve this problem?
} “Ask the waiter”
} Ref:A.Tannenbaum,“Modern Operating Systems”

Source: http://en.wikipedia.org/wiki/Dining_philosophers_problem

20
Source: ImagNet Large Scale
Visual Recognition Challenge

21
Source: devblogs.nvidia.com

CPU v.s. GPU
22

CPU v.s. GPU
23

CPU v.s. GPU
24

CPU v.s. GPU
25
NVIDIA GTX280 (2008)
576mm2 @ 65nm, 1.4B trans.
240 Thread Processors
Intel Penryn (2008)
215mm2 @ 45nm, 0.82B trans.
4 cores
(source: Intel) (source: NVIDIA)

CPU v.s. GPU
26
NVIDIA GM204 Maxwell (2014)
398mm2 @ 28nm, 5.2B trans.
2048 cores
Intel Core i7 5960X (2014)
355mm2 @ 22nm, 2.6B trans.
8 cores
(source: Intel) (source: NVIDIA)

CPU v.s. GPU
27

FLOPS, GPU v.s. CPU
(source: NVIDIA)

Memory Bandwidth, GPU v.s. CPU
(source: NVIDIA)

GPU Applications
30
} Real-time rendering: Games, of course
Minecraft

GPU Applications
31
} Real-time rendering: Games, of course

GPU Applications
32 https://renderman.pixar.com/

GPUs Today
33
} GPUs are becoming more programmable
} Unified & programmable shaders
} GPUs now support 32/64 bit floating point numbers
} Almost IEEE FP compliant except for some specials
} GPUs have higher memory bandwidth than CPUs
} Multiple memory banks driven by the need of high-performance
graphics
How to make it easier to
program on?

NVIDIA CUDA
34
} “Compute Unified Device Architecture”
} Or simply,“COMPUTE”
} “A General-Purpose Parallel Computing Environment”
} minimal set of C language extensions to harness GPU’s
computational resources
} CUDA toolset includes compiler, SDK and profiler, etc
} “nvcc hello_cuda.cu” v.s.“gcc hello_world.c”

Great Barracuda
(Source: oceanwideimages.com)

Plymouth Hemi Cuda, 1971
(Source: carfunblog.com)

CUDA Workflow
} Get a CUDA-enabled GPU
} Write C/C++ like code (*.cu)
} Compile with CUDA compiler (nvcc)
} Generates PTX code (“Parallel Thread Execution”)
} Applications auto-magically run on GPUs
} Many many parallel threads
} CUDA driver translate PTX code into HW

CUDA Overview
} CUDA C/C++ language extensions
} Small sets of extensions for writing kernels - sub-routines
that runs multi-threaded on GPUs
} CUDA Programming model abstraction
} For fine-grained data / thread parallelism, including
} Thread group hierarchy
} Shared memories
} Synchronization barriers

C/C++ Language Extensions
CPU
(host)
GPU
(device)
int main() {
…
cudaMalloc(…)
cudaMemcpy(…)
…
my_kernel<<<nblock, blocksize>>>(…)
…
cudaMemcpy(…)
…
}
__global__ void my_kernel(…) {
…
__shared__ float …
… blockIdx…
… threadIdx…
int i = gpu_func(…)
…
}
__device__ int gpu_func(…) {
…
}

CPU
(host)
int main() {
…
cudaMalloc(…)
cudaMemcpy(…)
…
…
cudaMemcpy(…)
…
}
…
… blockIdx…
… threadIdx…
…
}
…
}
GPU
(device)

GPU
(device)
…
… blockIdx…
… threadIdx…
…
}
…
}
CPU
(host)
int main() {
…
cudaMalloc(…)
cudaMemcpy(…)
…
…
cudaMemcpy(…)
…
}

CUDA Overview
} CUDA C/C++ language extensions
} Small sets of extensions for writing kernels - sub-routines that
runs multi-threaded on GPUs
} CUDA Programming model abstraction
} For fine-grained data / thread parallelism, including
} Shared memories
} Synchronization barriers

CUDA Programming Model Abstraction
48
} Serial code runs on the host (CPU)
} Parallel code runs on the device (GPU)
host code
device code
kernel<<nBlocks, nThreads>>(…)
host code
…

Grid

Grid of blocks
(blockIdx.x, blockIdx.y)

Grid of blocks of threads
(threadIdx.x, threadIdx.y)

__global__ myKernel<<<nBlocks, nThreads>>>(…)
(threadIdx.x, threadIdx.y)

GPU
(device)
…
…

Source: NVIDIA
Threads get scheduled
round-robin based on the
number of processors
available in the device
This also means you need
sufficient # of blocks, at
least as many as the # of
SMs, to fill the pipe

} Shared memory
Thread
Register
Constants
(Texture)

} Shared memory
Thread
Register
Constants
(Texture)
Shared Memory

CPU Memory Hierarchy
57
} A few threads
} Classic registers & cache
access locality
} Write-back cache
= lazy memory update
} Stalled on cache misses
} Very complicated cache miss
avoidance logic
} Speculation, forwarding, etc
CPU
Registers
Cache
DRAM / Main Memory

?
GPU Memory Hierarchy
58
GPU
ALU
Registers
Shared Memory
Device Memory /VRAM
GPU
ALU
Registers
GPU
ALU
Registers
Shared Memory
GPU
ALU
Registers
GPU
ALU
Registers
Shared Memory
GPU
ALU
Registers
?

} Shared memory
Shared
Memory
Thread
Block

} Shared memory
Shared
Memory 1
Thread
Block 1
Shared
Memory 2
Thread
Block 2

61
} Synchronization Barrier
} SIMT threads launched in a unit
called a warp
warp #2warp #1
Writing Writing

62
called a warp
} Problems occur when one warp
reads from another before it’s
finished
warp #2warp #1
Finished Still writing

63
called a warp
finished
warp #2warp #1
Still writing

64
called a warp
finished
} __syncthreads() prevents the
read-after-write hazard
} BTW, a warp doesn’t branch
} data-dependent conditional branch
only
warp #2warp #1
Finished Still writing
(waiting)

Example: Matrix Multiplication
} C = A x B
} Naïve implementation:
} Read a row of A
} Read a column of B
} Dot product
} Slow!
} Lots of global
memory reads
Source: NVIDIA CUDA Programming Guide

Matrix Multiplication Revisited
66
} C = A x B
} Shared memory
} Source code in SDK

67
} C = A x B
} Shared memory
T0,0 T0,1 T0,2
T1,0 T1,1 T1,2
T2,0 T2,1 T2,2

68
} C = A x B
} Shared memory
T0,0 T0,1 T0,2
T1,0 T1,1 T1,2
T2,0 T2,1 T2,2

69
} C = A x B
} Shared memory
*( )

70
} C = A x B
} Shared memory

Common Program Pattern
71
1. Load data from device to shared memory
2. Synchronize with all other threads in the same block
3. Process data in the shared memory
4. Synchronize again if necessary
5. Write results back to the device memory

Simplified Graphics Pipeline
72
Input Processor
Instructions
States
Data

73
Input Processor
Do Geometry Stuff …
Transform
Lighting, etc

74
Input Processor
Do Geometry Stuff
Do Pixel Stuff
…
…Rasterize
Pixel Shading

Simplified Graphics Pipeline (1990s)
75
Input Processor
Do Geometry Stuff
Do Pixel Stuff
Accumulate Pixel Result
…
…
Sorting Stage
(Z-buffer)
Transparency

Do Geometry Stuff
Making It Faster (Mid 1990s)
76
Input Processor
Do Geometry Stuff
Do Pixel Stuff
Do Geometry Stuff
Do Pixel Stuff
Do Pixel Stuff
…
…

Do Geometry Stuff
Add Framebuffer Access (Late 1990s)
77
Input Processor
Do Geometry Stuff
Do Pixel Stuff
Do Geometry Stuff
Do Pixel Stuff
Do Pixel Stuff
Memory
Memory
Memory
Memory
FBI
MUX

Do Geometry Stuff
Add Programmability (Early 2000s)
78
Front End
Do Geometry Stuff
Do Pixel Stuff
Raster Operations
Geometry Shader ALU
Do Pixel Stuff
Pixel Shader ALU
Memory
Memory
Memory
Memory
FBI
MUX

Do Geometry Stuff
Add Programmability (Early 2000s)
79
Front End
Do Geometry Stuff
Do Pixel Stuff
Raster Operations
Geometry Shader ALU
Do Pixel Stuff
Pixel Shader ALU
Memory
Memory
Memory
Memory
FBI
MUX
These two stages
look similar:
1. Get internal data
2. Get external data
3. Process data
4. Output data

Unified Shader (Mid 2000s)
80
Front End
Raster Operations
THE Shader
Memory
Memory
Memory
Memory
FBI
MUX
THE Shader
THE SHADER
Buffer
& MUX
Resource booking
is important here
- deadlock
- throughput

Scaling Up Again
81
Front End
Raster Operations
THE Shader
Memory
Memory
Memory
Memory
FBI
MUX
THE Shader
THE SHADER
Buffer
& glue

Pixel Shader
Geometry Shader
Software Pipeline v.s. 
Hardware Pipeline
82
Vertex Shader
Hardware Shader
Software Pipeline Hardware Pipeline

83
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron.
Scalable parallel programming with cuda. Queue, 6(2):40{53, 2008.

84
Processors

85
Sorting &
Distribution

86
Raster Operations
& Memory I/O

GM204 consists of four GPCs (Graphics Processing Clusters)
16 Maxwell SMs (SMM)

Is GPU SIMD?
88
} Suppose we have this program:
c[i] = a[i] + b[i]
if (c[i]>pi)
path-A
else
path-B
…
} What happens at the if... statement?

Divergence
89
} Some say yay, some say nay
c[i] = a[i] + b[i]
if (c[i]>pi)
path-A
path-A
path-A
else
path-B
path-B
…
yay nay
c[i] = a[i] + b[i]
if (c[i]>pi)
path-A
path-A
path-A
else
path-B
path-B
…

Divergence
90
} Some say yay, some say nay
} How do you run this efficiently?
i

Divergence
91
} SIMD-like processing; lots of NOPs
} 31% bubble (30/96)
i
time

Divergence
92
} SIMD-like processing with insufficient resources
} 25% bubble (22/88) - if we can predict branch outcome
i
time

Multiple Programs – GPUs?
93
} SIMD-like processing with insufficient resources + multiple
programs
} Many people assume this is how GPUs work. But this isn’t the
case.
time

What Can Go Wrong?
94
} Cache miss stall
} Can’t predict if branch should be taken
time

SIMT (T for Threads)
95
} First appeared in NVIDIA Tesla GPU Architecture, 2006
} Jump to different program upon stall, bubble, etc
} BTW, this is not exactly what really happened inside of
GPUs either, but close enough.

Summary: GPU Programming
96
} Easy to start coding
} Relatively simple programming model
} Extension of C++ language
} Difficult to master
} Memory access tend to be the bottleneck
} Optimisation still an art form
} Portability across GPUs still hard
} Use libraries / frameworks whenever possible.

FRAMEWORKS AND LIBRARIES
OR: HOWTO BE A DEEP LEARNING EXPERT
WITHOUT WRITING GPU CODE

ASK STUDENTSTO DO IT (!)
For a less corrupt lifestyle, go online and shop around
https://developer.nvidia.com/gpu-accelerated-libraries

MATLAB
•Supports GPUs through Parallel Computing
Toolbox
•Supports multiple processors and distributed
servers as well
•Use parfor for parallel for loops
•Use gpuArray to create an array on GPU
•Use built-in functions with gpuArray args
Look for:“Parallel Computing with MATLAB” presentation on mathworks.com to get started

Source: http://www.mathworks.com/help/distcomp/run-built-in-functions-on-a-gpu.html?
searchHighlight=built%20in%20gpu%20functions
Most linear algebra functions are supported

CUDA BLAS LIBRARY
•BLAS - Basic Linear Algebra Subprograms
•The library Matlab is built on
•Processor vendors implement their BLAS library
• e.g., Intel MKL (Math Kernel Library)
•cuBLAS - CUDA version, very fast
• No need to write your own, unless you are researching the topic

NVIDIATHRUST LIBRARY
•A little like C++ STL Library for CUDA
•Very few lines of code for vector manipulation
•Fast implementation of parallel primitives
• reduce
• scan
• sort
Source: https://developer.nvidia.com/thrust

NVIDIA CUDNN
•VERY fast, very hard to beat
•Convolution, array and tensor transformations, etc.
•Supports popular deep learning frameworks
•Caffe,TensorFlow,Torch, CNTK, etc
•Basically, you don’t have to use this directly — just get started with
one of the deep learning frameworks above (e.g., Caffe)
https://developer.nvidia.com/deep-learning-courses

DEEP LEARNING
GETTING STARTED ADVISES
•Borrow (steal if you must) a modern GPU
•Use Caffe for your deep learning projects
•http://caffe.berkeleyvision.org/
•Browse through the Caffe Model Zoo and try out the existing (pre-
trained) models (AlexNet, R-CNN and GoogLeNet are free to use)
•http://caffe.berkeleyvision.org/model_zoo.html

ACKNOWLEDGEMENTS

視訊訊號處理與深度學習應用

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 視訊訊號處理與深度學習應用

Similar to 視訊訊號處理與深度學習應用 (20)

More from 台灣資料科學年會

More from 台灣資料科學年會 (20)

Recently uploaded

Recently uploaded (20)

視訊訊號處理與深度學習應用