SlideShare uma empresa Scribd logo
1 de 99
Baixar para ler offline
NIPS’07 tutorial
            (preliminary)
Visual Recognition in Primates
        and Machines

  Tomaso Poggio (with Thomas Serre)
            McGovern Institute for Brain Research
       Center for Biological and Computational Learning
          Department of Brain & Cognitive Sciences
            Massachusetts Institute of Technology
                  Cambridge, MA 02139 USA
Motivation for studying vision:
      trying to understand how the brain works




• Old dream of all
  philosophers and more
  recently of AI:
  – understand how the
    brain works
  – make intelligent
    machines
This tutorial:
     using a class of models to summarize/interpret
                  experimental results




• Models are cartoons of reality eg Bohr’s model of the
  hydrogen atom
• All models are “wrong”
• Some models can be useful summaries of data and some
  can be a good starting point for more complete theories
1.   Problem of visual recognition, visual cortex
2.   Historical background
3.   Neurons and areas in the visual system
4.   Data and feedforward hierarchical models
5.   What is next?
The problem: recognition in natural images
 (e.g., “is there an animal in the image?”)
How does visual cortex solve this problem?
 How can computers solve this problem?




                                           dorsal
                                          stream:
                                          “where”


                                           ventral
                                           stream:
                                            “what”



                                Desimone & Ungerleider 1989
A “feedforward” version of the problem:
          rapid categorization




           SHOW RSVP
             MOVIE
            Movie courtesy of Jim DiCarlo




                           Biederman 1972; Potter 1975; Thorpe et al 1996
A model of the ventral stream which is also an algorithm
                          *Modified from (Gross, 1998)




Riesenhuber & Poggio 1999, 2000;
Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005;
Serre Oliva Poggio 2007                                  [software available online]
…solves the problem
            (if mask forces feedforward processing)…


                                          Model 82%
                                                          human-
                                                        observers (n
                                                         = 24) 80%
• d’~ standardized error                              Human 80%
rate
• the higher the d’, the
better the perf.




                                             Serre Oliva & Poggio 2007
1.   Problem of visual recognition, visual cortex
2.   Historical background
3.   Neurons and areas in the visual system
4.   Data and feedforward hierarchical models
5.   What is next?
Ka
                                                        na




                                                    …
                                                          de
                                                                 19
                                                                   74

                                                    Tu




                                 1990
                                                      rk
                                                           &
                                                               Pe
                                                                  nt




                                                                                                                                                                          Car detection
                                                                     lan

                                                                                                                                                                                                                Face detection
                                                                        d




                                                                                                 Digit recognition
                                                                            19
                                                                              91

                                                                                                                                                                                          Face identification



                                                                                                                                                   Pedestrian detection
                                                    Br
                                                       un




                                                                                                                     Multi-class / multi-objects
                                                         ell
                                                            i&
                                                                  Po




*Best CVPR’07 paper 10 yrs ago
                                                    Su              gg
                                                       ng             io
                                                                            19
                                                            &                 93
                                                                Po
                                                                  gg
                                                                     io
                                                                          19
                                                     Be                      94
                                                         im
                                                             er




                                 1995
                                                        Pe & P
                                                            ro
                                                              na ogg
                                                                   an io 1
                                                     Os               d
                                                         un              co 995
                                                             a              lle
                                                                &              ag
                                                                   Gi              ue
                                                    Le                ro              s1
                                                       Cu                si               99
                                                                            19               6-
                                                           n                   97               no
                                                              et
                                                                 al.               *              w
                                                    M                19
                                                                         98
                                                     oh
                                                         an                  ;S
                                                              Pa                 ch
                                                                  pa                ne
                                                                     ge                ide
                                                                         or                rm
                                                                            gio               an
                                                                                u                 &
                                                     Sc                            &                Ka
                                                         hn                          Po                 na
                                                             eid                         gg                 de
                                                     Vi          er                         io
                                                        o          m                           19               19
                                                                                                                  98
                                                    Be la &           an                          99
                                                       lon J               &                         ;A              ;R
                                                            g     on          Ka                         m               ow
                                                        al ie & es                na                       it a            ley
                                                            20 M 20                  de                        nd              Ba
                                                               02 a 01                   20                        Ge             luj
                                                                        lik                                                          a
                                                                            20
                                                                                            00                        m                &
                                                                                02                                      an               Ka
                                                                                   ;A                                       19              na
                                                    Fe                                rg                                      99
                                                                                                                                                                                                                                     personal historical perspective




                                                      rg                                                                                      de
                                                          us                             aw                                                      19
                                 2000 … Many more




                                                    To                                       al                                                    98
                                                       rra et                                   &
                                                           lba al                                 Ro
                                                                     2                                th
                                                                et 00                                    20
                                                                    al 3                                     02
 the past few years…




                                                                       20                                       ;U
                                                                                                                                                                                                                                 Object recognition for computer vision:




                                                                           04
 excellent algorithms in




                                                                                                                   llm
                                                     …




                                                                                                                       an
Examples: Learning Object Detection:
           Finding Frontal Faces

• Training Database
• 1000+ Real, 3000+ VIRTUAL
• 50,0000+ Non-Face Pattern




                                     Sung & Poggio 1995
~10 year old CBCL computer vision work: pedestrian
    detection system in Mercedes test car now
          becoming a product (MobilEye)
Hu
               be
                 l       &
                             W
             Hu




1960
                              ies
               be                 e
                 l       &
                                      l1
                                        95
                             W            9
             Hu               ies
               be                 el1
                 l       &            96
                             W          2
                              ies
                                  el
             Gr                      19
                  os                   65
                     s   et
                            a   l1




1970
                                  96
                                     9
             Ze
               ki
                     19
                        73
                                                                            V1 cat


                                                                            IT-STS



             Hu
                                                                            V1 monkey




               be
                 l       &
                             W
                              ies
                                  e
                                  l1




1980
                                                                            Exstrastriate cortex




             Un                      97
          Sc
                ge
                    rle                 7
             hw         ide
          De artz            r&
             sim e               M
                                    Ish
                 on t al                ki
                     e        19
                        et       83 n 19
                            al               82
                               19               ;P
                                  84              er
                                                    re
             Sc                                       tt
                                                         Ro



1990
                hil                                        lls
                   ler
                        &                                      et
             Ko             Le                                    al
                ba
                    ta         e                                     19
                                                                       82
                       ke        19
          Lo                         91
            go              &
               th              Ta
                  et             na
                     is              ka
                         et              19
                            al             94
                                19
                                   95
                                                                                                      Historical perspective




   … Much
                                                                                                   Object recognition in cortex:




   past 10 yrs
   progress in the
Some personal history:
      First step in developing a model:
learning to recognize 3D objects in IT cortex


             Examples of Visual Stimuli




                                          Poggio & Edelman 1990
An idea for a module for view-invariant
                  identification

                                        Prediction:
                                        neurons become
                                        view-tuned
                       VIEW-
                                        through learning
                    INVARIANT,
Architecture that     OBJECT-           Regularization
accounts for         SPECIFIC           Network (GRBF)
invariances to 3D                       with Gaussian kernels
effects (>1 view
                       UNIT
needed to learn!)




                           View Angle
                                         Poggio & Edelman 1990
Learning to Recognize 3D Objects in IT
                 Cortex


                             Examples of Visual Stimuli

After human psychophysics
(Buelthoff, Edelman, Tarr,
Sinha, …), which supports
models based on view-tuned
units...


… physiology!



                                  Logothetis Pauls & Poggio 1995
Recording Sites in Anterior IT




 LUN
         LAT

         STS
                                …neurons tuned to
                              faces are intermingled
IOS                                 nearby….
         AMTS
                LAT
                STS    Ho=0

                AMTS




                                Logothetis, Pauls & Poggio 1995
Neurons tuned to object views as
      predicted by model




                      Logothetis Pauls & Poggio 1995
A “View-Tuned” IT Cell
                Target Views
                 -168       o      -120      o      -108         o    -96       o    -84       o        -72     o    -60      o     -48      o     -36        o    -24    o     -12    o     0   o




                            -168            -120            -108              -96            -84              -72            -60            -48              -36         -24           -12           0




                  12    o          24   o          36   o            48   o         60   o         72    o          84   o         96   o         108    o         120   o     132 o         168 o




                            12              24              36                48             60                72             84            96               108         120




                Distractors
60 spikes/sec




                 800 msec




                                                                                                                                                   Logothetis Pauls & Poggio 1995
But also view-invariant object-specific neurons
       (5 of them over 1000 recordings)




                                Logothetis Pauls & Poggio 1995
View-tuned cells:
scale invariance (one training view only) motivates present model

                              Scale Invariant Responses of an IT Neuron




      76                      1.0 deg    76                   1.75 deg   76                   2.5 deg      76              3.25 deg
                               (x 0.4)                         (x 0.7)                         (x 1.0)                      (x 1.3)




                                                                         Spikes/sec
     Spikes/sec




                                         Spikes/sec




                                                                                                           /
                                                                                                           ik
          0                                0                                  0                             0
                   0   1000 2000 3000       0          1000 2000 3000          0      1000 2000 3000        0      1000 2000 3000
                        Time (msec)                     Time (msec)                    Time (msec)                  Time (msec)




      76                      4.0 deg    76                   4.75 deg    76                  5.5 deg      76              6.25 deg
                               (x 1.6)                         (x 1.9)                         (x 2.2)                      (x 2.5)
      Spikes/sec




                                          Spikes/sec




                                                                                                            /
                                                                            /




                                                                                                            S ik
                                                                            S ik




          0                                0                                  0                             0
                   0   1000 2000 3000           0      1000 2000 3000            0    1000 2000 3000        0      1000 2000 3000
                        Time (msec)                     Time (msec)                    Time (msec)                  Time (msec)
                                                                                                         Logothetis Pauls & Poggio 1995
From “HMAX” to the model now…




                                    Riesenhuber & Poggio 1999, 2000;
                                Serre Kouh Cadieu Knoblich Kreiman &
                                 Poggio 2005; Serre Oliva Poggio 2007
1.   Problem of visual recognition, visual cortex
2.   Historical background
3.   Neurons and areas in the visual system
4.   Data and feedforward hierarchical models
5.   What is next?
Neural Circuits




     Source: Modified from Jody Culham’s web slides
Neuron basics

                         INPUT= pulses or
                         graded potentials


                          COMPUTATION
                          = Analog

spikes


                          OUTPUT =
                          Chemical
Some numbers


• Human Brain
  – 1011-1012 neurons         (1 million flies ☺)
  – 1014- 1015 synapses


• Neuron
  – Fundamental space dimension:
     • fine dendrites : 0.1 µ diameter; lipid bilayer membrane : 5 nm
       thick; specific proteins : pumps, channels, receptors,
       enzymes
  – Fundamental time length : 1 msec
The cerebral cortex



                         Human           Macaque
Thickness                3 – 4 mm        1 – 2 mm
Total surface area       ~1600 cm2       ~160 cm2
(both sides)             (~50cm diam)    (~15cm diam)

Neurons /mm²             ~10⁵/ mm2       ~ 10⁵/ mm2
Total cortical neurons   ~2 x 1010       ~2 x 109

Visual cortex            300 – 500 cm2   80+cm2
Visual Neurons           ~4 x 109        ~109 neurons
Gross Brain Anatomy




A large percentage of the cortex devoted
 to vision
The Visual System




                    [Van Essen & Anderson, 1990]
V1: hierarchy of simple and complex cells

LGN-type Simple      Complex
  cells   cells        cells




                                      (Hubel & Wiesel 1959)
V1: Orientation selectivity




     Hubel & Wiesel
        movie
V1: Retinotopy
(Thorpe and Fabre-Thorpe, 2001)
Beyond V1: A gradual increase in RF
               size




Reproduced from [Kobatake & Tanaka, 1994]   Reproduced from [Rolls, 2004]
Beyond V1: A gradual increase in the
complexity of the preferred stimulus




                   Reproduced from (Kobatake & Tanaka, 1994)
AIT: Face cells




                  Reproduced from (Desimone et al. 1984)
AIT: Immediate recognition




                                                      categorization



                                                   identification




                                 Hung Kreiman Poggio & DiCarlo 2005
     See also Oram & Perrett 1992; Tovee et al 1993; Celebrini et al 1993;
                 Ringach et al 1997; Rolls et al 1999; Keysers et al 2001
1.   Problem of visual recognition, visual cortex
2.   Historical background
3.   Neurons and areas in the visual system
4.   Data and feedforward hierarchical models
5.   What is next?
The ventral stream




Source: Lennie, Maunsell, Movshon
We consider feedforward architecture
                only




                        (Thorpe and Fabre-Thorpe, 2001)
Our present model of the ventral stream:
feedforward, accounting only for “immediate
               recognition”

 • It is in the family of “Hubel-Wiesel” models (Hubel &
   Wiesel, 1959; Fukushima, 1980; Oram & Perrett, 1993, Wallis & Rolls, 1997; Riesenhuber &
   Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun
   et al 1998; Amit & Mascaro 2003; Deco & Rolls 2006…)

 • As a biological model of object recognition in the
   ventral stream it is perhaps the most quantitative
   and faithful to known biology (though many
   details/facts are unknown or still to be
   incorporated)
Two key computations


Unit types   Pooling   Computation     Operation


                       Selectivity /   Gaussian-
 Simple                 template        tuning /
                        matching        and-like




                                       Soft-max /
Complex                 Invariance
                                        or-like
Gaussian-like tuning
operation (and-like)

 Simple units




  Max-like operation (or-like)

  Complex units
Gaussian tuning

Gaussian tuning in       Gaussian tuning in IT
V1 for orientation        around 3D views




 Hubel & Wiesel 1958   Logothetis Pauls & Poggio 1995
Max-like operation

Max-like behavior in V4         Max-like behavior in V1




   Gawne & Martin 2002    Lampl Ferster Poggio & Riesenhuber 2004
                            see also Finn Prieber & Ferster 2007
Biophys. implementation


• Max and Gaussian-like tuning
  can be approximated with
  same canonical circuit using
  shunting inhibition




  (Knoblich Koch Poggio in prep; Kouh & Poggio 2007; Knoblich Bouvrie Poggio 2007)
Can be implemented by
shunting inhibition (Grossberg
1973, Reichardt et al. 1983,
Carandini and Heeger, 1994)
and spike threshold variability
(Anderson et al. 2000, Miller
and Troyer, 2002)
Adelson and Bergen (see also
Hassenstein and Reichardt,
1956)




 Of the same form as model
 of MT (Rust et al., Nature
 Neuroscience, 2007
• Task-specific circuits     (from IT
  to PFC)
   – Supervised learning: ~ Gaussian
     RBF




• Generic, overcomplete
  dictionary of reusable shape
  components (from V1 to IT)
  provide unique representation
   – Unsupervised learning (from
     ~10,000 natural images) during a
     developmental-like stage
S2 units


•       Features of moderate complexity (n~1,000
        types)
•       Combination of V1-like complex units at
        different orientations




    •   Synaptic weights w                         stronger
                                                   facilitation
        learned from natural
        images

    •   5-10 subunits chosen
        at random from all
        possible afferents                         stronger
        (~100-1,000)                               suppression
Nature Neuroscience - 10, 1313 - 1321 (2007) / Published online: 16 September 2007 | doi:10.1038/nn1975
Neurons in monkey visual area V2 encode combinations of orientations
                  Akiyuki Anzai, Xinmiao Peng & David C Van Essen
C2 units



•   Same selectivity as S2 units but
    increased tolerance to position and
    size of preferred stimulus
•   Local pooling over S2 units with
    same selectivity but slightly
    different positions and scales
•   A prediction to be tested: S2 units
    in V2 and C2 units in V4?
A loose hierarchy



• Bypass routes along with main routes:
• From V2 to TEO (bypassing V4) (Morel & Bullier 1990; Baizer et al 1991; Distler et al 1991;
 Weller & Steele 1992; Nakamura et al 1993; Buffalo et al 2005)
• From V4 to TE (bypassing TEO) (Desimone et al 1980; Saleem et al 1992)

• “Replication” of simpler selectivities from lower
  to higher areas
• Richer dictionary of features with various
  levels of selectivity and invariance
Comparison w| neural data
• V1:
  • Simple and complex cells tuning (Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982)
  • MAX operation in subset of complex cells (Lampl et al 2004)
• V4:
  • Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999)
  • MAX operation (Gawne et al 2002)
  • Two-spot interaction (Freiwald et al 2005)
  • Tuning for boundary conformation (Pasupathy & Connor 2001, Cadieu et al., 2007)
  • Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996)
• IT:
  • Tuning and invariance properties (Logothetis et al 1995)
  • Differential role of IT and PFC in categorization (Freedman et al 2001, 2002, 2003)
  • Read out data (Hung Kreiman Poggio & DiCarlo 2005)
  • Pseudo-average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo 2007)
• Human:
  • Rapid categorization (Serre Oliva Poggio 2007)
  • Face processing (fMRI + psychophysics) (Riesenhuber et al 2004; Jiang et al 2006)



                                                  (Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005)
Comparison w| V4


   Tuning for
 curvature and
   boundary
conformations?




                                    Pasupathy & Connor 2001
No parameter fitting!


  V4 neuron tuned to
                                  Most similar model C2 unit
boundary conformations




Pasupathy & Connor 1999
                                             ρ = 0.78

                      Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
J Neurophysiol 98: 1733-1750, 2007. First published June 27, 2007


   A Model of V4 Shape Selectivity and Invariance
Charles Cadieu, Minjoon Kouh, Anitha Pasupathy, Charles E. Connor, Maximilian
                      Riesenhuber and Tomaso Poggio
Prediction: Response of the pair is predicted to fall
= resp (pair) –resp (reference)
                                  between the responses elicited by the stimuli alone

                                                                     Reference (fixed)
                                         V4 neurons
                                    (with attention directed away                      C2 Probe (varying)
                                                                                          units
                                         from receptive field)




                                                         = response(probe) –response(reference)
                                        (Reynolds et al 1999)
                                                                    (Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005)
Agreement w| IT Readout data

Hung Kreiman Poggio DiCarlo 2005




                          Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
Remarks


• The stage that includes (V4-PIT)-AIT-PFC
  represents a learning network of the Gaussian
  RBF type that is known (from learning theory) to
  generalize well
• In the theory the stage between IT and ‘’PFC” is
  a linear classifier – like the one used in the read-
  out experiments
• The inputs to IT are a large dictionary of
  selective and invariant features
Rapid categorization




  SHOW ANIMAL /
   NON_ANIMAL
     MOVIE
Database collected by Oliva & Torralba
Rapid categorization task (with mask to test
                   feedforward model)

                                   Image
                                                       Interval
                                                       Image-Mask

                                                                      Mask
                                                                      1/f noise
   20 ms


                 30 ms ISI


                                      80 ms                        Animal present
                                                                      or not ?
Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005
…solves the problem (when mask forces
         feedforward processing)…


                            Model 82%
                                            human-
                                          observers (n
                                           = 24) 80%
• d’~ standardized error                Human 80%
rate
• the higher the d’, the
better the perf.




                               Serre Oliva & Poggio 2007
Further comparisons


• Image-by-image correlation:
  –   Heads:       ρ=0.71
  –   Close-body: ρ=0.84
  –   Medium-body: ρ=0.71
  –   Far-body:    ρ=0.60



• Model predicts level of performance on rotated
  images (90 deg and inversion)



                                Serre Oliva & Poggio PNAS 2007
The street scene project




                    Source: Bileschi & Wolf
The StreetScenes Database




 3,547 Images, all taken with the same camera, of the same type of scene, and
      hand labeled with the same objects, using the same labeling rules.
      Object         car    pedestrian   bicycle   building   tree   road   sky

# Labeled Examples   5799     1449        209       5067      4932   3400   2562



http://cbcl.mit.edu/software-datasets/streetscenes/
Examples
Examples
Examples
Examples
• HoG:
                          (Dalal & Triggs 2005)

                          • Part-based system:
                          (Leibe et al 2004)

                          • Local patch correlation:
                          (Torralba et al 2004)




Serre Wolf Bileschi Riesenhuber & Poggio PAMI 2007
Serre Wolf Bileschi Riesenhuber & Poggio PAMI 2007
1.   Problem of visual recognition, visual cortex
2.   Historical background
3.   Neurons and areas in the visual system
4.   Data and feedforward hierarchical models
5.   What is next?
What is next


• A challenge for physiology: disprove basic
  aspects of the architecture
• Extensions to color and stereo
• More sophisticated unsupervised,
  developmental learning in V1, V4, PIT: how?
• Extension to time and videos
• Extending the simulation to integrate-and-fire
  neurons (~ 1 billion) and realistic synapses:
  towards the neural code
What is next


• A challenge for physiology: disprove basic
  aspects of the architecture
• Extensions to color and stereo
• More sophisticated unsupervised,
  developmental learning in V1, V4, PIT: how?
• Extension to time and videos
• Extending the simulation to integrate-and-fire
  neurons (~ 1 billion) and realistic synapses:
  towards the neural code
Layers of cortical processing units

• Task-specific circuits     (from IT
  to PFC)
   – Supervised learning




• Generic dictionary of shape
  components (from V1 to IT)
   – Unsupervised learning during a
     developmental-like stage learning
     dictionaries of “templates” at
     different S levels
Learning the invariance from temporal
              continuity

            w| T. Masquelier & S. Thorpe (CNRS, France)




✦ Simple cells learn
correlation in space                           SHOW MOVIE
(at the same time)
✦ Complex cells learn
correlation in time


                 Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Einhauser
                            et al 2002; Wiskott & Sejnowski 2002; Spratling 2005
What is next


• A challenge for physiology: disprove basic
  aspects of the architecture
• Extensions to color and stereo
• More sophisticated unsupervised,
  developmental learning in V1, V4, PIT: how?
• Extension to time and videos
• Extending the simulation to integrate-and-fire
  neurons (~ 1 billion) and realistic synapses:
  towards the neural code
The problem
     Training Videos               Testing videos




   bend         jack        jump



   pjump        run         walk




   side        wave1       wave2

*each video~4s, 50~100 frames
                                     Dataset from (Blank et al, 2005)
Previous work:
  recognizing biological motion
using a model of the dorsal stream




            Adapted from (Giese & Poggio, 2003)




                    See also (Casile & Giese 2005; Sigala et al, 2005)
Multi-class recognition accuracy


                           Baseline            Our system
    KTH Human               81.3 %                91.6 %
    UCSD Mice               75.6 %                79.0 %
   Weiz. Human              86.7 %                96.3 %
      Average               81.2 %                89.6 %




HH. Jhuang, T. Serre, L. Wolf* and T. Poggio, ICCV, 2007    * chances: 10%~20%
What is next:
     beyond feedforward models: limitations


            IT
                                                     Psychophysics
                                     V4




                               Reynolds Chelazzi &
                                 Desimone 1999
                                                      Serre Oliva Poggio 2007

Zoccolan Kouh Poggio DiCarlo 2007
What is next:
             beyond feedforward models:
                     limitations




• Recognition in clutter is increasingly difficult
• Need for attentional bottleneck (Wolfe, 1994)
  perhaps in V4 (see Gallant and Desimone and
  models by Walther + Serre)
• Notice: this is a “novel” justification for the need
  of attention!
Limitations: beyond 50 ms:
  model not good enough




                 no mask condition

                 80 ms SOA (ISI=60 ms)
                 model

                 50 ms SOA (ISI=30 ms)



                 20 ms SOA (ISI=0 ms)
                  (Serre, Oliva and Poggio, PNAS, 2007)
Ongoing work….
detection acuracy



                                                  Attention and
                                                     cortical
                                                   feedbacks

                       num. items

                    Face detection:
                    scanning vs. attention   ✦ Model implementation of
                                             Wolfe’s guided search (1994)
                                             ✦ Parallel (feature-based top-
                                             down attention) and serial
                                             (spatial attention) to suppress
                                             clutter (Tsostos et al)
                                                Chikkerur Serre Walther Koch & Poggio in prep
Example Results
What is next



• Image inference: attentional or Bayesian
  models?
• Why hierarchies? Beyond a model towards a
  theory
• Against hierarchies and the ventral stream:
  subcortical pathways
What is next: image inference,
backprojections and attentional mechanisms



 • Normal recognition by humans (for long times) is
   much better

 • Normal vision is much more than categorization
   or identification: image
   understanding/inference/parsing
Attention-based models with high-level
             specialized routines




•   Feedforward model + backprojections implementing featural and the
    spatial attention may improve recognition performance
•   Backprojections also access/route information in/from lower areas to
    specific task-dependent routines in PFC (?). Open questions:
      -- Which biophysical mechanisms for routing/gating?
      -- Nature of routines in higher areas (eg PFC)?
Bayesian models


 Analysis-by-synthesis models, eg probabilistic inference in
 the ventral stream: neurons represent conditional probabilities
 of the bottom-up sensory inputs given the top-down
 hypothesis and converge to globally consistent values




Lee and Mumford, 2003; Dean,2005 ;Rao, 2004; Hawkins, 2004; Ullman, 2007, Hinton, 2005
What is next



• Image inference: attentional or Bayesian
  models?
• Why hierarchies? Beyond a model, towards a
  theory
• Against hierarchies and the ventral stream:
  subcortical pathways (Bar et al., 2006, …)
Notices of the American Mathematical Society (AMS), Vol. 50, No. 5,
                                      537-544, 2003.
                     The Mathematics of Learning: Dealing with Data
                                        Tomaso Poggio and Steve Smale




How then do the learning machines described in the theory compare with brains?

  One of the most obvious differences is the ability of people and animals to
learn from very few examples.
  A comparison with real brains offers another, related, challenge to learning theory. The “learning algorithms”
we have described in this paper correspond to one-layer architectures. Are
                                                        hierarchical architectures
with more layers justifiable in terms of learning theory?

  Why hierarchies? For instance, the lowest levels of the hierarchy may represent a dictionary of features
that can be shared across multiple classification tasks.

    There may also be the more fundamental issue of sample complexity. Thus our ability of learning from
just a few examples, and its limitations, may be related to the hierarchical architecture of cortex.
Formalizing the hierarchy: towards a theory
Smale, S., T. Poggio, A.
Caponnetto, and J. Bouvrie.
Derived Distance: towards a
mathematical theory of
visual cortex, CBCL Paper,
Massachusetts Institute of
Technology, Cambridge,
MA, November, 2007.
From a model to a theory: math results on
    unsupervised learning of invariances and of a dictionary of shapes
                         from image sequences




…




                                             Caponetto, Smale and Poggio, in preparation
(Obvious) caution remark!!!




There is still much to do before we understand
                      vision…
                 and the brain!
Collaborators


                       T. Serre
Model                Comparison w| humans   Computer vision

   A. Oliva             A. Oliva             • S. Bileschi
   C. Cadieu         Action recognition      • L. Wolf
   U. Knoblich          H. Jhuang           Learning invariances

   M. Kouh           Attention               •T. Masquelier
   G. Kreiman           S. Chikkerur         •S. Thorpe
   M. Riesenhuber       C. Koch

                        D. Walther

Mais conteúdo relacionado

Mais de zukun

Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...zukun
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learningzukun
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...zukun
 
Lecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningLecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningzukun
 

Mais de zukun (20)

Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learning
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
 
Lecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningLecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learning
 

Último

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 

Último (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 

NIPS2007: visual recognition in primates and machines

  • 1. NIPS’07 tutorial (preliminary) Visual Recognition in Primates and Machines Tomaso Poggio (with Thomas Serre) McGovern Institute for Brain Research Center for Biological and Computational Learning Department of Brain & Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 USA
  • 2. Motivation for studying vision: trying to understand how the brain works • Old dream of all philosophers and more recently of AI: – understand how the brain works – make intelligent machines
  • 3. This tutorial: using a class of models to summarize/interpret experimental results • Models are cartoons of reality eg Bohr’s model of the hydrogen atom • All models are “wrong” • Some models can be useful summaries of data and some can be a good starting point for more complete theories
  • 4. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?
  • 5. The problem: recognition in natural images (e.g., “is there an animal in the image?”)
  • 6. How does visual cortex solve this problem? How can computers solve this problem? dorsal stream: “where” ventral stream: “what” Desimone & Ungerleider 1989
  • 7. A “feedforward” version of the problem: rapid categorization SHOW RSVP MOVIE Movie courtesy of Jim DiCarlo Biederman 1972; Potter 1975; Thorpe et al 1996
  • 8. A model of the ventral stream which is also an algorithm *Modified from (Gross, 1998) Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007 [software available online]
  • 9. …solves the problem (if mask forces feedforward processing)… Model 82% human- observers (n = 24) 80% • d’~ standardized error Human 80% rate • the higher the d’, the better the perf. Serre Oliva & Poggio 2007
  • 10. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?
  • 11. Ka na … de 19 74 Tu 1990 rk & Pe nt Car detection lan Face detection d Digit recognition 19 91 Face identification Pedestrian detection Br un Multi-class / multi-objects ell i& Po *Best CVPR’07 paper 10 yrs ago Su gg ng io 19 & 93 Po gg io 19 Be 94 im er 1995 Pe & P ro na ogg an io 1 Os d un co 995 a lle & ag Gi ue Le ro s1 Cu si 99 19 6- n 97 no et al. * w M 19 98 oh an ;S Pa ch pa ne ge ide or rm gio an u & Sc & Ka hn Po na eid gg de Vi er io o m 19 19 98 Be la & an 99 lon J & ;A ;R g on Ka m ow al ie & es na it a ley 20 M 20 de nd Ba 02 a 01 20 Ge luj lik a 20 00 m & 02 an Ka ;A 19 na Fe rg 99 personal historical perspective rg de us aw 19 2000 … Many more To al 98 rra et & lba al Ro 2 th et 00 20 al 3 02 the past few years… 20 ;U Object recognition for computer vision: 04 excellent algorithms in llm … an
  • 12. Examples: Learning Object Detection: Finding Frontal Faces • Training Database • 1000+ Real, 3000+ VIRTUAL • 50,0000+ Non-Face Pattern Sung & Poggio 1995
  • 13. ~10 year old CBCL computer vision work: pedestrian detection system in Mercedes test car now becoming a product (MobilEye)
  • 14. Hu be l & W Hu 1960 ies be e l & l1 95 W 9 Hu ies be el1 l & 96 W 2 ies el Gr 19 os 65 s et a l1 1970 96 9 Ze ki 19 73 V1 cat IT-STS Hu V1 monkey be l & W ies e l1 1980 Exstrastriate cortex Un 97 Sc ge rle 7 hw ide De artz r& sim e M Ish on t al ki e 19 et 83 n 19 al 82 19 ;P 84 er re Sc tt Ro 1990 hil lls ler & et Ko Le al ba ta e 19 82 ke 19 Lo 91 go & th Ta et na is ka et 19 al 94 19 95 Historical perspective … Much Object recognition in cortex: past 10 yrs progress in the
  • 15. Some personal history: First step in developing a model: learning to recognize 3D objects in IT cortex Examples of Visual Stimuli Poggio & Edelman 1990
  • 16. An idea for a module for view-invariant identification Prediction: neurons become view-tuned VIEW- through learning INVARIANT, Architecture that OBJECT- Regularization accounts for SPECIFIC Network (GRBF) invariances to 3D with Gaussian kernels effects (>1 view UNIT needed to learn!) View Angle Poggio & Edelman 1990
  • 17. Learning to Recognize 3D Objects in IT Cortex Examples of Visual Stimuli After human psychophysics (Buelthoff, Edelman, Tarr, Sinha, …), which supports models based on view-tuned units... … physiology! Logothetis Pauls & Poggio 1995
  • 18. Recording Sites in Anterior IT LUN LAT STS …neurons tuned to faces are intermingled IOS nearby…. AMTS LAT STS Ho=0 AMTS Logothetis, Pauls & Poggio 1995
  • 19. Neurons tuned to object views as predicted by model Logothetis Pauls & Poggio 1995
  • 20. A “View-Tuned” IT Cell Target Views -168 o -120 o -108 o -96 o -84 o -72 o -60 o -48 o -36 o -24 o -12 o 0 o -168 -120 -108 -96 -84 -72 -60 -48 -36 -24 -12 0 12 o 24 o 36 o 48 o 60 o 72 o 84 o 96 o 108 o 120 o 132 o 168 o 12 24 36 48 60 72 84 96 108 120 Distractors 60 spikes/sec 800 msec Logothetis Pauls & Poggio 1995
  • 21. But also view-invariant object-specific neurons (5 of them over 1000 recordings) Logothetis Pauls & Poggio 1995
  • 22. View-tuned cells: scale invariance (one training view only) motivates present model Scale Invariant Responses of an IT Neuron 76 1.0 deg 76 1.75 deg 76 2.5 deg 76 3.25 deg (x 0.4) (x 0.7) (x 1.0) (x 1.3) Spikes/sec Spikes/sec Spikes/sec / ik 0 0 0 0 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 Time (msec) Time (msec) Time (msec) Time (msec) 76 4.0 deg 76 4.75 deg 76 5.5 deg 76 6.25 deg (x 1.6) (x 1.9) (x 2.2) (x 2.5) Spikes/sec Spikes/sec / / S ik S ik 0 0 0 0 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 Time (msec) Time (msec) Time (msec) Time (msec) Logothetis Pauls & Poggio 1995
  • 23. From “HMAX” to the model now… Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007
  • 24. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?
  • 25. Neural Circuits Source: Modified from Jody Culham’s web slides
  • 26. Neuron basics INPUT= pulses or graded potentials COMPUTATION = Analog spikes OUTPUT = Chemical
  • 27. Some numbers • Human Brain – 1011-1012 neurons (1 million flies ☺) – 1014- 1015 synapses • Neuron – Fundamental space dimension: • fine dendrites : 0.1 µ diameter; lipid bilayer membrane : 5 nm thick; specific proteins : pumps, channels, receptors, enzymes – Fundamental time length : 1 msec
  • 28. The cerebral cortex Human Macaque Thickness 3 – 4 mm 1 – 2 mm Total surface area ~1600 cm2 ~160 cm2 (both sides) (~50cm diam) (~15cm diam) Neurons /mm² ~10⁵/ mm2 ~ 10⁵/ mm2 Total cortical neurons ~2 x 1010 ~2 x 109 Visual cortex 300 – 500 cm2 80+cm2 Visual Neurons ~4 x 109 ~109 neurons
  • 29. Gross Brain Anatomy A large percentage of the cortex devoted to vision
  • 30. The Visual System [Van Essen & Anderson, 1990]
  • 31. V1: hierarchy of simple and complex cells LGN-type Simple Complex cells cells cells (Hubel & Wiesel 1959)
  • 32. V1: Orientation selectivity Hubel & Wiesel movie
  • 35. Beyond V1: A gradual increase in RF size Reproduced from [Kobatake & Tanaka, 1994] Reproduced from [Rolls, 2004]
  • 36. Beyond V1: A gradual increase in the complexity of the preferred stimulus Reproduced from (Kobatake & Tanaka, 1994)
  • 37. AIT: Face cells Reproduced from (Desimone et al. 1984)
  • 38. AIT: Immediate recognition categorization identification Hung Kreiman Poggio & DiCarlo 2005 See also Oram & Perrett 1992; Tovee et al 1993; Celebrini et al 1993; Ringach et al 1997; Rolls et al 1999; Keysers et al 2001
  • 39. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?
  • 40. The ventral stream Source: Lennie, Maunsell, Movshon
  • 41. We consider feedforward architecture only (Thorpe and Fabre-Thorpe, 2001)
  • 42. Our present model of the ventral stream: feedforward, accounting only for “immediate recognition” • It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959; Fukushima, 1980; Oram & Perrett, 1993, Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998; Amit & Mascaro 2003; Deco & Rolls 2006…) • As a biological model of object recognition in the ventral stream it is perhaps the most quantitative and faithful to known biology (though many details/facts are unknown or still to be incorporated)
  • 43. Two key computations Unit types Pooling Computation Operation Selectivity / Gaussian- Simple template tuning / matching and-like Soft-max / Complex Invariance or-like
  • 44. Gaussian-like tuning operation (and-like) Simple units Max-like operation (or-like) Complex units
  • 45. Gaussian tuning Gaussian tuning in Gaussian tuning in IT V1 for orientation around 3D views Hubel & Wiesel 1958 Logothetis Pauls & Poggio 1995
  • 46. Max-like operation Max-like behavior in V4 Max-like behavior in V1 Gawne & Martin 2002 Lampl Ferster Poggio & Riesenhuber 2004 see also Finn Prieber & Ferster 2007
  • 47. Biophys. implementation • Max and Gaussian-like tuning can be approximated with same canonical circuit using shunting inhibition (Knoblich Koch Poggio in prep; Kouh & Poggio 2007; Knoblich Bouvrie Poggio 2007)
  • 48. Can be implemented by shunting inhibition (Grossberg 1973, Reichardt et al. 1983, Carandini and Heeger, 1994) and spike threshold variability (Anderson et al. 2000, Miller and Troyer, 2002) Adelson and Bergen (see also Hassenstein and Reichardt, 1956) Of the same form as model of MT (Rust et al., Nature Neuroscience, 2007
  • 49. • Task-specific circuits (from IT to PFC) – Supervised learning: ~ Gaussian RBF • Generic, overcomplete dictionary of reusable shape components (from V1 to IT) provide unique representation – Unsupervised learning (from ~10,000 natural images) during a developmental-like stage
  • 50. S2 units • Features of moderate complexity (n~1,000 types) • Combination of V1-like complex units at different orientations • Synaptic weights w stronger facilitation learned from natural images • 5-10 subunits chosen at random from all possible afferents stronger (~100-1,000) suppression
  • 51. Nature Neuroscience - 10, 1313 - 1321 (2007) / Published online: 16 September 2007 | doi:10.1038/nn1975 Neurons in monkey visual area V2 encode combinations of orientations Akiyuki Anzai, Xinmiao Peng & David C Van Essen
  • 52. C2 units • Same selectivity as S2 units but increased tolerance to position and size of preferred stimulus • Local pooling over S2 units with same selectivity but slightly different positions and scales • A prediction to be tested: S2 units in V2 and C2 units in V4?
  • 53. A loose hierarchy • Bypass routes along with main routes: • From V2 to TEO (bypassing V4) (Morel & Bullier 1990; Baizer et al 1991; Distler et al 1991; Weller & Steele 1992; Nakamura et al 1993; Buffalo et al 2005) • From V4 to TE (bypassing TEO) (Desimone et al 1980; Saleem et al 1992) • “Replication” of simpler selectivities from lower to higher areas • Richer dictionary of features with various levels of selectivity and invariance
  • 54. Comparison w| neural data • V1: • Simple and complex cells tuning (Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982) • MAX operation in subset of complex cells (Lampl et al 2004) • V4: • Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999) • MAX operation (Gawne et al 2002) • Two-spot interaction (Freiwald et al 2005) • Tuning for boundary conformation (Pasupathy & Connor 2001, Cadieu et al., 2007) • Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996) • IT: • Tuning and invariance properties (Logothetis et al 1995) • Differential role of IT and PFC in categorization (Freedman et al 2001, 2002, 2003) • Read out data (Hung Kreiman Poggio & DiCarlo 2005) • Pseudo-average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo 2007) • Human: • Rapid categorization (Serre Oliva Poggio 2007) • Face processing (fMRI + psychophysics) (Riesenhuber et al 2004; Jiang et al 2006) (Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005)
  • 55. Comparison w| V4 Tuning for curvature and boundary conformations? Pasupathy & Connor 2001
  • 56. No parameter fitting! V4 neuron tuned to Most similar model C2 unit boundary conformations Pasupathy & Connor 1999 ρ = 0.78 Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
  • 57. J Neurophysiol 98: 1733-1750, 2007. First published June 27, 2007 A Model of V4 Shape Selectivity and Invariance Charles Cadieu, Minjoon Kouh, Anitha Pasupathy, Charles E. Connor, Maximilian Riesenhuber and Tomaso Poggio
  • 58. Prediction: Response of the pair is predicted to fall = resp (pair) –resp (reference) between the responses elicited by the stimuli alone Reference (fixed) V4 neurons (with attention directed away C2 Probe (varying) units from receptive field) = response(probe) –response(reference) (Reynolds et al 1999) (Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005)
  • 59. Agreement w| IT Readout data Hung Kreiman Poggio DiCarlo 2005 Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
  • 60. Remarks • The stage that includes (V4-PIT)-AIT-PFC represents a learning network of the Gaussian RBF type that is known (from learning theory) to generalize well • In the theory the stage between IT and ‘’PFC” is a linear classifier – like the one used in the read- out experiments • The inputs to IT are a large dictionary of selective and invariant features
  • 61. Rapid categorization SHOW ANIMAL / NON_ANIMAL MOVIE
  • 62. Database collected by Oliva & Torralba
  • 63. Rapid categorization task (with mask to test feedforward model) Image Interval Image-Mask Mask 1/f noise 20 ms 30 ms ISI 80 ms Animal present or not ? Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005
  • 64. …solves the problem (when mask forces feedforward processing)… Model 82% human- observers (n = 24) 80% • d’~ standardized error Human 80% rate • the higher the d’, the better the perf. Serre Oliva & Poggio 2007
  • 65. Further comparisons • Image-by-image correlation: – Heads: ρ=0.71 – Close-body: ρ=0.84 – Medium-body: ρ=0.71 – Far-body: ρ=0.60 • Model predicts level of performance on rotated images (90 deg and inversion) Serre Oliva & Poggio PNAS 2007
  • 66. The street scene project Source: Bileschi & Wolf
  • 67. The StreetScenes Database 3,547 Images, all taken with the same camera, of the same type of scene, and hand labeled with the same objects, using the same labeling rules. Object car pedestrian bicycle building tree road sky # Labeled Examples 5799 1449 209 5067 4932 3400 2562 http://cbcl.mit.edu/software-datasets/streetscenes/
  • 72. • HoG: (Dalal & Triggs 2005) • Part-based system: (Leibe et al 2004) • Local patch correlation: (Torralba et al 2004) Serre Wolf Bileschi Riesenhuber & Poggio PAMI 2007
  • 73. Serre Wolf Bileschi Riesenhuber & Poggio PAMI 2007
  • 74. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?
  • 75. What is next • A challenge for physiology: disprove basic aspects of the architecture • Extensions to color and stereo • More sophisticated unsupervised, developmental learning in V1, V4, PIT: how? • Extension to time and videos • Extending the simulation to integrate-and-fire neurons (~ 1 billion) and realistic synapses: towards the neural code
  • 76. What is next • A challenge for physiology: disprove basic aspects of the architecture • Extensions to color and stereo • More sophisticated unsupervised, developmental learning in V1, V4, PIT: how? • Extension to time and videos • Extending the simulation to integrate-and-fire neurons (~ 1 billion) and realistic synapses: towards the neural code
  • 77. Layers of cortical processing units • Task-specific circuits (from IT to PFC) – Supervised learning • Generic dictionary of shape components (from V1 to IT) – Unsupervised learning during a developmental-like stage learning dictionaries of “templates” at different S levels
  • 78. Learning the invariance from temporal continuity w| T. Masquelier & S. Thorpe (CNRS, France) ✦ Simple cells learn correlation in space SHOW MOVIE (at the same time) ✦ Complex cells learn correlation in time Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Einhauser et al 2002; Wiskott & Sejnowski 2002; Spratling 2005
  • 79. What is next • A challenge for physiology: disprove basic aspects of the architecture • Extensions to color and stereo • More sophisticated unsupervised, developmental learning in V1, V4, PIT: how? • Extension to time and videos • Extending the simulation to integrate-and-fire neurons (~ 1 billion) and realistic synapses: towards the neural code
  • 80. The problem Training Videos Testing videos bend jack jump pjump run walk side wave1 wave2 *each video~4s, 50~100 frames Dataset from (Blank et al, 2005)
  • 81. Previous work: recognizing biological motion using a model of the dorsal stream Adapted from (Giese & Poggio, 2003) See also (Casile & Giese 2005; Sigala et al, 2005)
  • 82. Multi-class recognition accuracy Baseline Our system KTH Human 81.3 % 91.6 % UCSD Mice 75.6 % 79.0 % Weiz. Human 86.7 % 96.3 % Average 81.2 % 89.6 % HH. Jhuang, T. Serre, L. Wolf* and T. Poggio, ICCV, 2007 * chances: 10%~20%
  • 83. What is next: beyond feedforward models: limitations IT Psychophysics V4 Reynolds Chelazzi & Desimone 1999 Serre Oliva Poggio 2007 Zoccolan Kouh Poggio DiCarlo 2007
  • 84. What is next: beyond feedforward models: limitations • Recognition in clutter is increasingly difficult • Need for attentional bottleneck (Wolfe, 1994) perhaps in V4 (see Gallant and Desimone and models by Walther + Serre) • Notice: this is a “novel” justification for the need of attention!
  • 85. Limitations: beyond 50 ms: model not good enough no mask condition 80 ms SOA (ISI=60 ms) model 50 ms SOA (ISI=30 ms) 20 ms SOA (ISI=0 ms) (Serre, Oliva and Poggio, PNAS, 2007)
  • 87. detection acuracy Attention and cortical feedbacks num. items Face detection: scanning vs. attention ✦ Model implementation of Wolfe’s guided search (1994) ✦ Parallel (feature-based top- down attention) and serial (spatial attention) to suppress clutter (Tsostos et al) Chikkerur Serre Walther Koch & Poggio in prep
  • 89. What is next • Image inference: attentional or Bayesian models? • Why hierarchies? Beyond a model towards a theory • Against hierarchies and the ventral stream: subcortical pathways
  • 90. What is next: image inference, backprojections and attentional mechanisms • Normal recognition by humans (for long times) is much better • Normal vision is much more than categorization or identification: image understanding/inference/parsing
  • 91. Attention-based models with high-level specialized routines • Feedforward model + backprojections implementing featural and the spatial attention may improve recognition performance • Backprojections also access/route information in/from lower areas to specific task-dependent routines in PFC (?). Open questions: -- Which biophysical mechanisms for routing/gating? -- Nature of routines in higher areas (eg PFC)?
  • 92. Bayesian models Analysis-by-synthesis models, eg probabilistic inference in the ventral stream: neurons represent conditional probabilities of the bottom-up sensory inputs given the top-down hypothesis and converge to globally consistent values Lee and Mumford, 2003; Dean,2005 ;Rao, 2004; Hawkins, 2004; Ullman, 2007, Hinton, 2005
  • 93. What is next • Image inference: attentional or Bayesian models? • Why hierarchies? Beyond a model, towards a theory • Against hierarchies and the ventral stream: subcortical pathways (Bar et al., 2006, …)
  • 94. Notices of the American Mathematical Society (AMS), Vol. 50, No. 5, 537-544, 2003. The Mathematics of Learning: Dealing with Data Tomaso Poggio and Steve Smale How then do the learning machines described in the theory compare with brains? One of the most obvious differences is the ability of people and animals to learn from very few examples. A comparison with real brains offers another, related, challenge to learning theory. The “learning algorithms” we have described in this paper correspond to one-layer architectures. Are hierarchical architectures with more layers justifiable in terms of learning theory? Why hierarchies? For instance, the lowest levels of the hierarchy may represent a dictionary of features that can be shared across multiple classification tasks. There may also be the more fundamental issue of sample complexity. Thus our ability of learning from just a few examples, and its limitations, may be related to the hierarchical architecture of cortex.
  • 95. Formalizing the hierarchy: towards a theory
  • 96. Smale, S., T. Poggio, A. Caponnetto, and J. Bouvrie. Derived Distance: towards a mathematical theory of visual cortex, CBCL Paper, Massachusetts Institute of Technology, Cambridge, MA, November, 2007.
  • 97. From a model to a theory: math results on unsupervised learning of invariances and of a dictionary of shapes from image sequences … Caponetto, Smale and Poggio, in preparation
  • 98. (Obvious) caution remark!!! There is still much to do before we understand vision… and the brain!
  • 99. Collaborators T. Serre Model Comparison w| humans Computer vision A. Oliva A. Oliva • S. Bileschi C. Cadieu Action recognition • L. Wolf U. Knoblich H. Jhuang Learning invariances M. Kouh Attention •T. Masquelier G. Kreiman S. Chikkerur •S. Thorpe M. Riesenhuber C. Koch D. Walther