Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach
David Cox, Harvard | James DiCarlo, Nicolas Pinto, MIT
Abstract:
The study of biological vision and the creation of artificial vision systems are naturally intertwined – exploration of the neuronal substrates of visual processing provides clues and inspiration for artificial systems, and artificial systems, in turn, serve as important generators of new ideas and working hypotheses. However, while systems neuroscience has provided inspiration for some of the "broad-stroke" properties of the visual system, much is still unknown. Even for those qualitative properties that most biologically-inspired models share, experimental data currently provide little constraint on their key parameters. Consequently, it is difficult to truly evaluate a set of computational ideas, since the performance of a model depends strongly on its particular instantiation – the size of the pooling kernels, the number of units per layer, exponents in normalization operations, etc.
To pave a way forward, we have developed a high-throughput approach to more expansively explore the possible range of biologically-inspired models, including models of larger, more realistic scale, leveraging recent advances in commodity stream processing hardware - particularly, high-end NVIDIA GPUs. In analogy to high-throughput screening approaches in molecular biology and genetics, we generated and trained thousands of potential network architectures and parameter instantiations, and "screened" the visual representations produced by these models using an object recognition task. From these candidate models, the most promising were selected for further analysis. We have shown that this approach can yield significant, reproducible gains in performance in a basic object recognition tasks, and that it can offer insight into which computational ideas are most important for achieving this performance.
In this talk, I'll also highlight how the application of flexible programming tools, such as high-level scripting and template metaprogramming, can enable large performance gains, while managing complexity for the developer. As the scale of available computational power continues to expand, our approach holds great potential both for accelerating progress in artificial vision, and for generating new, experimentally-testable hypotheses for the study of biological vision.
More on this research:
http://pinto.scripts.mit.edu/Research/Research
http://pinto.scripts.mit.edu/Research/Monster16GPU
http://www.rowland.harvard.edu/rjf/cox/Projects/projects/computation.html
More on IAP09 CUDA@MIT 6.963 at http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009
IAP09 CUDA@MIT 6.963 - Guest Lecture: Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach (David Cox, Harvard | Jim DiCarlo, Nicolas Pinto, MIT)
1. A High-Throughput Approach to
Discovering Good Forms of Visual
Representation
David Cox
The Rowland Institute at
Harvard
Nicolas Pinto
Jim DiCarlo
MIT BCS
The Rowland Institute at Harvard
HARVARD UNIVERSITY
4. Goals
1) “Building Brains”
Concrete example of real-world experiments
fundamental enabled by stream processing
hardware
2) Tricks of the trade
Some high-level highlights for how we
leverage CUDA to achieve our goals
5. Goals
1) “Building Brains”
Concrete example of real-world experiments
fundamental enabled by stream processing
hardware
2) Tricks of the trade
Some high-level highlights for how we
leverage CUDA to achieve our goals
52. How are things done normally?
Usual Formula:
1) One grad student
53. How are things done normally?
Usual Formula:
1) One grad student
2) One Model (size limited by runtime)
54. How are things done normally?
Usual Formula:
1) One grad student
2) One Model (size limited by runtime)
3) Performance numbers on a few
standard test sets
55. How are things done normally?
Usual Formula:
1) One grad student
2) One Model (size limited by runtime)
3) Performance numbers on a few
standard test sets
4) yay. we. rock.
56. How are things done normally?
Usual Formula:
1) One grad student
2) One Model (size limited by runtime)
3) Performance numbers on a few
standard test sets
4) yay. we. rock.
5) One Ph.D.
58. Why is this not optimal?
• Lots of parameters – can’t explore easily
59. Why is this not optimal?
• Lots of parameters – can’t explore easily
• Big models are paralyzingly slow to run
60. Why is this not optimal?
• Lots of parameters – can’t explore easily
• Big models are paralyzingly slow to run
• Advice from my friend:
61. Why is this not optimal?
• Lots of parameters – can’t explore easily
• Big models are paralyzingly slow to run
• Advice from my friend:
“Don’t run anything that takes longer than a
week to complete, because it will just crash
halfway through anyways (or you’ll discover
a bug) and you’ll never finish your Ph.D.”
64. Doing things a little bit dierently
1) One grad student
2) One Hundreds of Thousands of
BIG Models
65. Doing things a little bit dierently
1) One grad student
2) One Hundreds of Thousands of
BIG Models
3) Performance numbers on a few
standard test sets*
66. Doing things a little bit dierently
1) One grad student
2) One Hundreds of Thousands of
BIG Models
3) Performance numbers on a few
standard test sets*
67. Doing things a little bit dierently
1) One grad student
2) One Hundreds of Thousands of
BIG Models
3) Performance numbers on a few
standard test sets*
4) yay. we. rock.
68. Doing things a little bit dierently
1) One grad student
2) One Hundreds of Thousands of
BIG Models
3) Performance numbers on a few
standard test sets*
4) yay. we. rock.
5) One Ph.D.?
81. Pipeline: Biology-Inspired Vision
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Skim o best
models
82. Pipeline: Biology-Inspired Vision
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
84. Need to break all of the implicit rules
1. Test tens to hundreds of thousands
of instantiations of biologically-inspired
hierarchical models
85. Need to break all of the implicit rules
1. Test tens to hundreds of thousands
of instantiations of biologically-inspired
hierarchical models
2. Use more realistic inputs (e.g. large
quantities of video)
86. Need to break all of the implicit rules
1. Test tens to hundreds of thousands
of instantiations of biologically-inspired
hierarchical models
2. Use more realistic inputs (e.g. large
quantities of video)
3. Test models which begin to approach
the scale of natural systems
92. A Match Made in Heaven
Brains are parallel, GPUs are parallel
≈
93. A Match Made in Heaven
Brains are parallel, GPUs are parallel
≈
Multiple scales of parallelism:
94. A Match Made in Heaven
Brains are parallel, GPUs are parallel
≈
Multiple scales of parallelism:
“Embarrasingly” parallel: video
frames, regions
95. A Match Made in Heaven
Brains are parallel, GPUs are parallel
≈
Multiple scales of parallelism:
“Embarrasingly” parallel: video
frames, regions
Fine-grained: independent “neurons,”
operating on overlapping inputs
97. A Match Made in Heaven
Images In, Images Out
≈
Image processing particularly well-suited
98. A Match Made in Heaven
Images In, Images Out
≈
Image processing particularly well-suited
Excellent Arithmetic Intensity: very
natural to load image patches into
shared memory
99. A Match Made in Heaven
Images In, Images Out
≈
Image processing particularly well-suited
Excellent Arithmetic Intensity: very
natural to load image patches into
shared memory
Data: 2D / 3D locality
100. These things are REALLY fast
Performance (g ops) Development Time (hours)
101. These things are REALLY fast
Performance (g ops) Development Time (hours)
Matlab
C/SSE
PS3
GT200
102. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
C/SSE
PS3
GT200
103. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
9.0
C/SSE
PS3
GT200
104. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
9.0
C/SSE
110.0
PS3
GT200
105. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
9.0
C/SSE
110.0
PS3
330.0
GT200
106. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
0.5
9.0
C/SSE
110.0
PS3
330.0
GT200
107. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
0.5
9.0
C/SSE
10.0
110.0
PS3
330.0
GT200
108. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
0.5
9.0
C/SSE
10.0
110.0
PS3
30.0
330.0
GT200
109. These things are REALLY fast
Performance (g ops) Development Time (hours)
0.3
Matlab
0.5
9.0
C/SSE
10.0
110.0
PS3
30.0
330.0
GT200
10.0
110. Pipeline
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
111. Pipeline
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
112. Read-out
L3
thresh/sat norm strength
Learning
normalization
neighborhood Rate
Trace
“Temp. Adv.”
“Auto-reset”
...
number of lters
L2
thresh/sat norm strength
Learning
normalization
Rate
neighborhood
Trace
kernel
“Temp. Adv.”
size
“Auto-reset”
...
n. of lters
L1
Learning
thresh/sat norm strength
Rate
normalization
Trace
neighborhood
“Temp. Adv.”
“Auto-reset”
...
kernel
size
number of lters
input
kernel
size
114. A Broad Parametric Model
Normalize
Ni = Inputi / norm(Inputneighborhd)
Compute Filter Responses
Ri = Fi ⊗ N
Ri thresh: Ri = thresh
Ri sat: Ri = sat
Determine a “Winning Filter”
Ri’ = (∑ Tk * Hk) * Ri
winner: max(Ri’)
Update Filter
Fwinning = Fwinning + learning rate * N
115. A Broad Parametric Model
Normalize • Optimize “Coverage”
Ni = Inputi / norm(Inputneighborhd)
Compute Filter Responses
Ri = Fi ⊗ N
Ri thresh: Ri = thresh
Ri sat: Ri = sat
Determine a “Winning Filter”
Ri’ = (∑ Tk * Hk) * Ri
winner: max(Ri’)
Update Filter
Fwinning = Fwinning + learning rate * N
116. A Broad Parametric Model
Normalize • Optimize “Coverage”
Ni = Inputi / norm(Inputneighborhd) ( lters span the range of
observed inputs)
Compute Filter Responses
Ri = Fi ⊗ N
Ri thresh: Ri = thresh
Ri sat: Ri = sat
Determine a “Winning Filter”
Ri’ = (∑ Tk * Hk) * Ri
winner: max(Ri’)
Update Filter
Fwinning = Fwinning + learning rate * N
117. A Broad Parametric Model
Normalize • Optimize “Coverage”
Ni = Inputi / norm(Inputneighborhd) ( lters span the range of
observed inputs)
Compute Filter Responses
Ri = Fi ⊗ N
• Privilege movement of
Ri thresh: Ri = thresh
lters in certain
Ri sat: Ri = sat
directions using
Determine a “Winning Filter” temporal information
Ri’ = (∑ Tk * Hk) * Ri
winner: max(Ri’)
Update Filter
Fwinning = Fwinning + learning rate * N
118. A Broad Parametric Model
Normalize • Optimize “Coverage”
Ni = Inputi / norm(Inputneighborhd) ( lters span the range of
observed inputs)
Compute Filter Responses
Ri = Fi ⊗ N
• Privilege movement of
Ri thresh: Ri = thresh
lters in certain
Ri sat: Ri = sat
directions using
Determine a “Winning Filter” temporal information
Ri’ = (∑ Tk * Hk) * Ri
winner: max(Ri’)
• Expand dimensionality
greatly and then scale
Update Filter
Fwinning = Fwinning + learning rate * N back as layers progress
121. Dealing with Parametric Complexity
Throwing a wide net comes with its own
challenges:
Complexity
•
122. Dealing with Parametric Complexity
Throwing a wide net comes with its own
challenges:
Complexity
•
Kernels must perform under widely
•
varying conditions
123. Dealing with Parametric Complexity
Throwing a wide net comes with its own
challenges:
Complexity
•
Kernels must perform under widely
•
varying conditions
Best kernel for a 3x3 conv may not be the
same as the best kernel for a 17x17 one;
more complex operations are even hairier
128. Meta-programming
Leave the grunt-programming to the
computer
Dynamically compile specialized versions of
•
the same kernel for dierent conditions
Smooth syntactic ugliness: unroll loops,
•
index un-indexable registers
129. Meta-programming
Leave the grunt-programming to the
computer
Dynamically compile specialized versions of
•
the same kernel for dierent conditions
Smooth syntactic ugliness: unroll loops,
•
index un-indexable registers
Dynamic, empirical run-time tuning
•
142. Screening
A quick object rec. test to find promising models
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
143. Screening
A quick object rec. test to find promising models
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
151. Validation
See how we do on other test sets
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
152. Validation
See how we do on other test sets
Generate Unsupervised Test with
Random Models Learning (Video) “screening” task
Validate on other Skim o best
tasks models
165. Summary
• GPUs allow us to engage a qualitatively
dierent pace of experimentation
166. Summary
• GPUs allow us to engage a qualitatively
dierent pace of experimentation
• CUDA provides unprecedented
performance / eort ratio
167. Summary
• GPUs allow us to engage a qualitatively
dierent pace of experimentation
• CUDA provides unprecedented
performance / eort ratio
• New laboratory for studying visual
representation
170. Shameless Recruitment
Now
with
E
NVID xtra
Good IA
ness!
http://www.rowland.harvard.edu/rjf/cox
171. Acknowledgments
Cox Lab @
The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt
172. Acknowledgments
Cox Lab @
The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt
DiCarlo Lab @ MIT
• Jim DiCarlo
• Nicolas Pinto
173. Acknowledgments
Cox Lab @
The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt
DiCarlo Lab @ MIT
• Jim DiCarlo
• Nicolas Pinto
174. Acknowledgments
Cox Lab @
The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt
DiCarlo Lab @ MIT
• Jim DiCarlo
• Nicolas Pinto
The Rowland Institute at Harvard
HARVARD UNIVERSITY
181. The field does well on the ‘101
Performance
100
100%
8080
1] Wang et al. 2006
60 60
[2] Grauman and Darrell 2005
[3] Mutch and Lowe 2006
[4] Lazebnik et al. 2006
40 40 [5] Zhang et al. 2006
20 20
0 0
1 2 3 4 5
state of the art systems
204. Just one bad database?
Caltech 256
Face Databases: ORL, Yale, AR, CVL
205. Just one bad database?
Caltech 256
Face Databases: ORL, Yale, AR, CVL
See us @ ECCV 08
Faces in the Wild
Workshop
206. Face Databases
ORL 4 training examples 8 training examples
100
Performance (% correct)
80
60
40
20
0
V1-like V1-like
1 2 3 4 1 2 3 4
1. pixel space 3. and 4. Noushath et al. 2007
2. Savvides et al. 2007
207. Face Databases
AR 5 training 8 training
examples examples
100
Performance (% correct)
80
60
40
20
1. pixel space
2. Liang et al. 2007 0
3 V1-like 3 V1-like
1 2 1 2
3. Zhang et al. 2007
208. Face Databases
YALE 4 training 8 training
examples examples
100
Performance (% correct)
90
80
70
60
1. pixel space
2.,3.,4. and 5. Noushath et al. 2006 50
V1-like V1-like
1 2 34567 1 2 3456
6. Ben et al. 2006
7. Wang et al. 2007 chance (1/15=6.67%)
209. Face Databases
CVL 2 training 3 training
examples examples
(frontal only) (all)
100
Performance (% correct)
80
60
40
chance
20 (1/114=0.88%)
1. pixel space
2. Goel et al. 2005 0
3. Gokberk et al. 2002 V1-like V1-like
12 1 3
210. Face Databases
CAS-PEAL
4 training examples
facing facing facing
100
down forward up
Performance (% correct)
80
60
40
20
1. pixel space
2.,3.,4 and 5. Cao et al. 2004 0
1 2 3 4 5 V1 1 2 3 4 5 V1 1 2 3 4 5 V1