SlideShare a Scribd company logo
1 of 98
Mechanisms of
bottom-up and top-down
processing in visual
perception
Thomas Serre


McGovern Institute for Brain Research
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology
The problem:
recognition in natural scenes
Rapid recognition:
    human behavior




Potter 1971, 1975 see also Biederman 1972; Thorpe 1996   movie courtesy of Jim DiCarlo
Rapid recognition:
    human behavior




Potter 1971, 1975 see also Biederman 1972; Thorpe 1996   movie courtesy of Jim DiCarlo
Rapid recognition:
    human behavior
     Gist of the scene at 7 images/s from
     unpredictable random sequence of
     images
         No time for eye movements
         No top-down / expectations




Potter 1971, 1975 see also Biederman 1972; Thorpe 1996   movie courtesy of Jim DiCarlo
Rapid recognition:
    human behavior
     Gist of the scene at 7 images/s from
     unpredictable random sequence of
     images
         No time for eye movements
         No top-down / expectations
     Feedforward processing:
         Coarse / base image representation

Potter 1971, 1975 see also Biederman 1972; Thorpe 1996   movie courtesy of Jim DiCarlo
Outline
1.Rapid recognition and feedforward processing:
 Loose hierarchy of image fragments
 “Clutter problem”
Outline
1.Rapid recognition and feedforward processing:
 Loose hierarchy of image fragments
 “Clutter problem”
Outline
1.Rapid recognition and feedforward processing:
 Loose hierarchy of image fragments
 “Clutter problem”
Outline
1.Rapid recognition and feedforward processing:
  Loose hierarchy of image fragments
  “Clutter problem”



2.Beyond feedforward processing:
                                                                            X
                                                                             X
 Top-down cortical feedback and attention to solve the “clutter problem”
                                                                           XX
 Predicting human eye movements
Outline
1.Rapid recognition and feedforward processing:
  Loose hierarchy of image fragments
  “Clutter problem”



2.Beyond feedforward processing:
 Top-down cortical feedback and attention to solve the “clutter problem”
 Predicting human eye movements
Object recognition in the
visual cortex




                            source: Jim DiCarlo
Object recognition in the
visual cortex


              Ventral visual stream




                                      source: Jim DiCarlo
Object recognition in the
visual cortex

 Hierarchical architecture:

                              Ventral visual stream




                                                      source: Jim DiCarlo
Object recognition in the
visual cortex

 Hierarchical architecture:
   Latencies
                              Ventral visual stream




                                                      source: Jim DiCarlo
Object recognition in the
visual cortex

 Hierarchical architecture:
   Latencies
                            Ventral visual stream
   Anatomy




                                                    source: Jim DiCarlo
Object recognition in the
visual cortex

 Hierarchical architecture:
   Latencies
                            Ventral visual stream
   Anatomy
   Function




                                                    source: Jim DiCarlo
Object recognition in the
visual cortex




                                        Nobel prize 1981
Hubel & Wiesel 1959, 1962, 1965, 1968
Object recognition in the
     visual cortex



                                                     gradual increase in complexity
                                                         of preferred stimulus

            Kobatake & Tanaka 1994



see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
Object recognition in the
     visual cortex



                                                        Parallel increase in invariance
                                                        properties (position and scale)
                                                                   of neurons
            Kobatake & Tanaka 1994



see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
Model      RF sizes              Num.
                                                                                                                                                      layers                           units
                                                                                                                                           Animal
                                                                                                                                             vs.
                     Prefrontal                                                                11,




                                                                                                                                                                                                  task-dependent learning
                                                                                                                                                      classification
                                                                      8
                                                   46                        45 12
                                                                                                                                                                                        10 0
                                                                                                                                         non-animal
                      Cortex                                                                   13
                                                                                                                                                          units




                                                                                                                                                                                                        Supervised



                                                                                                                                                                                                                             Increase in complexity (number of subunits), RF size and invariance
                                                                                                           PG




                                                           V2,V3,V4,MT,MST
                                          LIP,VIP,DP,7a                                               V1




                                                                                        AIT,36,35
                                                                             PIT, AIT
                                                                                                                TE
                                                                                                                                                                                             2
                                                                                                                                                                              o
                                                                                                                                                       S4                 7             10

                                                          STP
                            Rostral STS


                                           }

                                                                                                                                 36 35
                                                                                                                            TG
                                                                                                                                                                              o
                                                                                                                                                                                        10 3
                                                                                                                                                       C3                 7
                                          TPO PGa IPa TEa TEm
         PG Cortex




                                                                                                                                                                                                 task-independent learning
                                                                                                                      AIT
                                                                                                                                                                              o
                                                                                                                                                                                        10 3
                                                                                                                                                                          7
                                                                                                                                                       C2b




                                                                                                                                                                                                       Unsupervised
                                                                                                                                                                          o        o
                                                                                                                                                                                        10 4
                                                                                                                                                                       1.2 - 3.2
                                                                                                                                                       S3

                                                                                                                PIT
       VIP LIP 7a PP MSTcMSTp
  DP                                                              FST                                                                                                             o
                                                                                                                                                                          o
                                                                                                                       TF
                                                                                                                                                                                        10 7
                                                                                                                                                                       0.9 - 4.4
                                                                                                                                                       S2b

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 5
                                                                                                                                                                       1.1 - 3.0
                                                                                                                                                       C2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 7
                                                                                                                                                                       0.6 - 2.4
                                                                                                            V4
                                                           PO                V3A           MT
                                                                                                                                                       S2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 4
                                                                                                                                                                       0.4 - 1.6
                                                                                                           V3
                                                                                                                                                       C1
                                                                                                     V2

                                                                                                                                                                                  o
                                                                                                                                                                       0.2o- 1.1        10 6
                                                                                                     V1
                                                                                                                                                       S1



                            dorsal stream                                                             ventral stream
                           'where' pathway                                                            'what' pathway

                                                                                                                                                             Simple cells
                                                                                                                                                             Complex cells
                                                                                                                                                                                  Main routes
                                                                                                                                                             Tuning
Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005                                                                                                             MAX                  Bypass routes
Model      RF sizes              Num.
                                                                                                                                                      layers                           units
                                                                                                                                           Animal
                                                                                                                                             vs.
                     Prefrontal                                                                11,




                                                                                                                                                                                                task-dependent learning
                                                                                                                                                      classification
                                                                      8
                                                   46                        45 12
                                                                                                                                                                                        10 0
                                                                                                                                         non-animal
                      Cortex                                                                   13


                                                                                                                                                               Large-scale (108
                                                                                                                                                          units




                                                                                                                                                                                                      Supervised



                                                                                                                                                                                                                           Increase in complexity (number of subunits), RF size and invariance
                                                                                                           PG




                                                           V2,V3,V4,MT,MST
                                                                                                                                                               units), spans
                                          LIP,VIP,DP,7a                                               V1




                                                                                        AIT,36,35
                                                                             PIT, AIT                                                                          several areas of
                                                                                                                TE
                                                                                                                                                                                           2
                                                                                                                                                                              o
                                                                                                                                                       S4           7   10


                                                                                                                                                               the visual cortex
                                                          STP
                            Rostral STS


                                           }

                                                                                                                                 36 35
                                                                                                                            TG
                                                                                                                                                                              o
                                                                                                                                                                                        10 3
                                                                                                                                                       C3                 7
                                          TPO PGa IPa TEa TEm
         PG Cortex




                                                                                                                                                                                               task-independent learning
                                                                                                                      AIT
                                                                                                                                                                              o
                                                                                                                                                                                        10 3
                                                                                                                                                                          7
                                                                                                                                                       C2b




                                                                                                                                                                                                     Unsupervised
                                                                                                                                                                          o        o
                                                                                                                                                                                        10 4
                                                                                                                                                                       1.2 - 3.2
                                                                                                                                                       S3

                                                                                                                PIT
       VIP LIP 7a PP MSTcMSTp
  DP                                                              FST                                                                                                             o
                                                                                                                                                                          o
                                                                                                                       TF
                                                                                                                                                                                        10 7
                                                                                                                                                                       0.9 - 4.4
                                                                                                                                                       S2b

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 5
                                                                                                                                                                       1.1 - 3.0
                                                                                                                                                       C2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 7
                                                                                                                                                                       0.6 - 2.4
                                                                                                            V4
                                                           PO                V3A           MT
                                                                                                                                                       S2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 4
                                                                                                                                                                       0.4 - 1.6
                                                                                                           V3
                                                                                                                                                       C1
                                                                                                     V2

                                                                                                                                                                                  o
                                                                                                                                                                       0.2o- 1.1        10 6
                                                                                                     V1
                                                                                                                                                       S1



                            dorsal stream                                                             ventral stream
                           'where' pathway                                                            'what' pathway

                                                                                                                                                             Simple cells
                                                                                                                                                             Complex cells
                                                                                                                                                                                  Main routes
                                                                                                                                                             Tuning
Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005                                                                                                             MAX                  Bypass routes
Model      RF sizes             Num.
                                                                                                                                                      layers                          units
                                                                                                                                           Animal
                                                                                                                                             vs.
                     Prefrontal                                                                11,




                                                                                                                                                                                               task-dependent learning
                                                                                                                                                      classification
                                                                      8
                                                   46                        45 12
                                                                                                                                                                                       10 0
                                                                                                                                         non-animal
                      Cortex                                                                   13


                                                                                                                                                               Large-scale (108
                                                                                                                                                          units




                                                                                                                                                                                                     Supervised



                                                                                                                                                                                                                          Increase in complexity (number of subunits), RF size and invariance
                                                                                                           PG




                                                           V2,V3,V4,MT,MST
                                                                                                                                                               units), spans
                                          LIP,VIP,DP,7a                                               V1




                                                                                        AIT,36,35
                                                                             PIT, AIT                                                                          several areas of
                                                                                                                TE
                                                                                                                                                                                          2
                                                                                                                                                                              o
                                                                                                                                                       S4           7   10


                                                                                                                                                               the visual cortex
                                                          STP
                            Rostral STS


                                           }

                                                                                                                                 36 35
                                                                                                                            TG
                                                                                                                                                                              o
                                                                                                                                                                                       10 3
                                                                                                                                                       C3                 7
                                          TPO PGa IPa TEa TEm
         PG Cortex




                                                                                                                                                                                              task-independent learning
                                                                                                                                                               Combination of
                                                                                                                      AIT
                                                                                                                                                                              o           3
                                                                                                                                                                     7      10
                                                                                                                                                       C2b




                                                                                                                                                                                                    Unsupervised
                                                                                                                                                               forward 10    and
                                                                                                                                                                          o       o       4
                                                                                                                                                                  1.2 - 3.2
                                                                                                                                                       S3
                                                                                                                                                               reverse
                                                                                                                PIT
       VIP LIP 7a PP MSTcMSTp
  DP                                                              FST                                                                                                             o
                                                                                                                                                                          o
                                                                                                                       TF                                                                 7
                                                                                                                                                                  0.9 - 4.4 10
                                                                                                                                                               engineering
                                                                                                                                                       S2b

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                       10 5
                                                                                                                                                                       1.1 - 3.0
                                                                                                                                                       C2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                       10 7
                                                                                                                                                                       0.6 - 2.4
                                                                                                            V4
                                                           PO                V3A           MT
                                                                                                                                                       S2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                       10 4
                                                                                                                                                                       0.4 - 1.6
                                                                                                           V3
                                                                                                                                                       C1
                                                                                                     V2

                                                                                                                                                                                  o
                                                                                                                                                                       0.2o- 1.1       10 6
                                                                                                     V1
                                                                                                                                                       S1



                            dorsal stream                                                             ventral stream
                           'where' pathway                                                            'what' pathway

                                                                                                                                                             Simple cells
                                                                                                                                                             Complex cells
                                                                                                                                                                                  Main routes
                                                                                                                                                             Tuning
Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005                                                                                                             MAX                  Bypass routes
Model      RF sizes             Num.
                                                                                                                                                      layers                          units
                                                                                                                                           Animal
                                                                                                                                             vs.
                     Prefrontal                                                                11,




                                                                                                                                                                                               task-dependent learning
                                                                                                                                                      classification
                                                                      8
                                                   46                        45 12
                                                                                                                                                                                       10 0
                                                                                                                                         non-animal
                      Cortex                                                                   13


                                                                                                                                                               Large-scale (108
                                                                                                                                                          units




                                                                                                                                                                                                     Supervised



                                                                                                                                                                                                                          Increase in complexity (number of subunits), RF size and invariance
                                                                                                           PG




                                                           V2,V3,V4,MT,MST
                                                                                                                                                               units), spans
                                          LIP,VIP,DP,7a                                               V1




                                                                                        AIT,36,35
                                                                             PIT, AIT                                                                          several areas of
                                                                                                                TE
                                                                                                                                                                                          2
                                                                                                                                                                              o
                                                                                                                                                       S4           7   10


                                                                                                                                                               the visual cortex
                                                          STP
                            Rostral STS


                                           }

                                                                                                                                 36 35
                                                                                                                            TG
                                                                                                                                                                              o
                                                                                                                                                                                       10 3
                                                                                                                                                       C3                 7
                                          TPO PGa IPa TEa TEm
         PG Cortex




                                                                                                                                                                                              task-independent learning
                                                                                                                                                               Combination of
                                                                                                                      AIT
                                                                                                                                                                              o           3
                                                                                                                                                                     7      10
                                                                                                                                                       C2b




                                                                                                                                                                                                    Unsupervised
                                                                                                                                                               forward 10    and
                                                                                                                                                                          o       o       4
                                                                                                                                                                  1.2 - 3.2
                                                                                                                                                       S3
                                                                                                                                                               reverse
                                                                                                                PIT
       VIP LIP 7a PP MSTcMSTp
  DP                                                              FST                                                                                                             o
                                                                                                                                                                          o
                                                                                                                       TF                                                                 7
                                                                                                                                                                  0.9 - 4.4 10
                                                                                                                                                               engineering
                                                                                                                                                       S2b

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                       10 5
                                                                                                                                                                       1.1 - 3.0
                                                                                                                                                       C2

                                                                                                                                                           Shown to be            o
                                                                                                                                                                          o               7
                                                                                                                                                                 0.6 - 2.4 10
                                                                                                            V4
                                                           PO                V3A           MT
                                                                                                                                                       S2

                                                                                                                                                           consistent with        o
                                                                                                                                                                          o               4
                                                                                                                                                                 0.4 - 1.6 10
                                                                                                           V3
                                                                                                                                                       C1
                                                                                                     V2
                                                                                                                                                           many1.1 10    experimental
                                                                                                                                                                                  o
                                                                                                                                                                          o               6
                                                                                                                                                                 0.2 -
                                                                                                     V1
                                                                                                                                                           data across areas
                                                                                                                                                       S1



                                                                                                                                                           of visual cortex
                            dorsal stream                                                             ventral stream
                           'where' pathway                                                            'what' pathway
                                                                                                                                                           (V1, V2, V4, MT
                                                                                                                                                           and IT)
                                                                                                                                                          Simple cells
                                                                                                                                                          Complex cells
                                                                                                                                                                                  Main routes
                                                                                                                                                             Tuning
Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005                                                                                                             MAX                  Bypass routes
Two functional classes of
cells
  Simple cells                                      Complex cells




                                                            Invariance
   Template matching
                                                         max-like operation
   Gaussian-like tuning
                                                               ~”OR”
        ~ “AND”

              Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)
Model      RF sizes              Num.
                                                                                                                                                      layers                           units
                                                                                                                                           Animal
                                                                                                                                             vs.
                     Prefrontal                                                                11,




                                                                                                                                                                                                  task-dependent learning
                                                                                                                                                      classification
                                                                      8
                                                   46                        45 12
                                                                                                                                                                                        10 0
                                                                                                                                         non-animal
                      Cortex                                                                   13
                                                                                                                                                          units




                                                                                                                                                                                                        Supervised



                                                                                                                                                                                                                             Increase in complexity (number of subunits), RF size and invariance
                                                                                                           PG




                                                           V2,V3,V4,MT,MST
                                          LIP,VIP,DP,7a                                               V1




                                                                                        AIT,36,35
                                                                             PIT, AIT
                                                                                                                TE
                                                                                                                                                                                             2
                                                                                                                                                                              o
                                                                                                                                                       S4                 7             10

                                                          STP
                            Rostral STS


                                           }

                                                                                                                                 36 35
                                                                                                                            TG
                                                                                                                                                                              o
                                                                                                                                                                                        10 3
                                                                                                                                                       C3                 7
                                          TPO PGa IPa TEa TEm
         PG Cortex




                                                                                                                                                                                                 task-independent learning
                                                                                                                      AIT
                                                                                                                                                                              o
                                                                                                                                                                                        10 3
                                                                                                                                                                          7
                                                                                                                                                       C2b




                                                                                                                                                                                                       Unsupervised
                                                                                                                                                                          o        o
                                                                                                                                                                                        10 4
                                                                                                                                                                       1.2 - 3.2
                                                                                                                                                       S3

                                                                                                                PIT
       VIP LIP 7a PP MSTcMSTp
  DP                                                              FST                                                                                                             o
                                                                                                                                                                          o
                                                                                                                       TF
                                                                                                                                                                                        10 7
                                                                                                                                                                       0.9 - 4.4
                                                                                                                                                       S2b

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 5
                                                                                                                                                                       1.1 - 3.0
                                                                                                                                                       C2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 7
                                                                                                                                                                       0.6 - 2.4
                                                                                                            V4
                                                           PO                V3A           MT
                                                                                                                                                       S2

                                                                                                                                                                                  o
                                                                                                                                                                          o
                                                                                                                                                                                        10 4
                                                                                                                                                                       0.4 - 1.6
                                                                                                           V3
                                                                                                                                                       C1
                                                                                                     V2

                                                                                                                                                                                  o
                                                                                                                                                                       0.2o- 1.1        10 6
                                                                                                     V1
                                                                                                                                                       S1



                            dorsal stream                                                             ventral stream
                           'where' pathway                                                            'what' pathway

                                                                                                                                                             Simple cells
                                                                                                                                                             Complex cells
                                                                                                                                                                                  Main routes
                                                                                                                                                             Tuning
Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005                                                                                                             MAX                  Bypass routes
Hierarchy of image
      fragments




see also Ullman et al 2002
Hierarchy of image
      fragments

         Unsupervised learning of
         frequent image fragments
         during development




see also Ullman et al 2002
Hierarchy of image
      fragments

         Unsupervised learning of
         frequent image fragments
         during development
         Reusable fragments shared
         across categories




see also Ullman et al 2002
Hierarchy of image
      fragments

         Unsupervised learning of
         frequent image fragments
         during development
         Reusable fragments shared
         across categories
         Large redundant vocabulary
         for implicit geometry

see also Ullman et al 2002
Hierarchy of image
      fragments

         Unsupervised learning of
         frequent image fragments     IT
         during development
         Reusable fragments shared
         across categories
         Large redundant vocabulary
         for implicit geometry
                                      V1
see also Ullman et al 2002
Hierarchy of image
      fragments

         Unsupervised learning of
         frequent image fragments     IT
         during development
         Reusable fragments shared
         across categories
         Large redundant vocabulary
         for implicit geometry
                                      V1
see also Ullman et al 2002
Hierarchy of image
      fragments

         Unsupervised learning of
         frequent image fragments     IT
         during development
         Reusable fragments shared
         across categories
         Large redundant vocabulary
         for implicit geometry
                                      V1
see also Ullman et al 2002
Hierarchy of image
      fragments                                           category
                                                          selective
                                                            units

                                      linear perceptron
         Unsupervised learning of
         frequent image fragments     IT
         during development
         Reusable fragments shared
         across categories
         Large redundant vocabulary
         for implicit geometry
                                      V1
see also Ullman et al 2002
Model vs. IT

                                   1              IT           Model

                                  0.8
     Classification performance




                                  0.6


                                  0.4


                                  0.2


                  0
               Size: 3.4o                3.4o     1.7o     6.8o   3.4o    3.4o
                                                          center 2ohorz. 4ohorz.
            Position: center            center   center


     TRAIN

Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
Is this model sufficient to
      explain performance in rapid
      categorization tasks?
                                             Image

                                                               Interval
                                                             Image-Mask

                                                                           Mask
                                                                          1/f noise
                   20 ms

                              30 ms ISI


                                              80 ms                Animal present
                                                                      or not ?
Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005
Rapid categorization




Serre Oliva & Poggio 2007
Rapid categorization
                            Head   Close-body   Medium-body   Far-body


              Animals




               Natural
             distractors



              Artificial
             distractors




Serre Oliva & Poggio 2007
Rapid categorization




Serre Oliva & Poggio 2007
Rapid categorization




                                      Head   Close-body   Medium-body   Far-body


                            Animals




                          Natural
                        distractors
Serre Oliva & Poggio 2007
Rapid categorization
                                               2.6



                                               2.4
                            Performance (d')



                                               1.8



                                               1.4
                                                            Model (82% correct)
                                                            Human observers (80% correct)
                                               1.0

                                                     Head     Close-body    Medium-body   Far-body
                                                     Head     Close-       Medium-    Far-
                                                              body          body      body
                            Animals




                          Natural
                        distractors
Serre Oliva & Poggio 2007
“Clutter effect”

     Limitation of feedforward
     model compatible with
     reduced selectivity in V4
     (Reynolds et al 1999) and IT in
     the presence of clutter
     (Zoccolan et al 2005, 2007; Rolls et al
     2003)




Meyers Freiwald Embark Kreiman Serre Poggio in prep
“Clutter effect”
                                                      Recording site in monkey’s IT



     Limitation of feedforward
     model compatible with
     reduced selectivity in V4                                  Model
     (Reynolds et al 1999) and IT in
     the presence of clutter                                     IT neurons
     (Zoccolan et al 2005, 2007; Rolls et al
     2003)

                                                                fMRI



Meyers Freiwald Embark Kreiman Serre Poggio in prep
Summary I

Rapid categorization seems compatible with model
based on feedforward hierarchy of image fragments
Consistent with psychophysics, key limitation of
architecture is recognition in clutter
How does the visual system overcome such limitation?
Outline
1.Rapid recognition and feedforward processing:
  Loose hierarchy of image fragments
  “Clutter problem”



2.Beyond feedforward processing:
                                                                            X
                                                                             X
 Top-down cortical feedback and attention to solve the “clutter problem”
                                                                           XX
 Predicting human eye movements
Spatial attention solves
  the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
Duncan & Desimone 1995; Wolfe, 1997;
and many others
Spatial attention solves
   the “clutter problem”
 see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
 Duncan & Desimone 1995; Wolfe, 1997;
 and many others



foreground
Spatial attention solves
   the “clutter problem”
 see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
 Duncan & Desimone 1995; Wolfe, 1997;
 and many others
                                        background

foreground
Spatial attention solves
   the “clutter problem”
 see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
 Duncan & Desimone 1995; Wolfe, 1997;
 and many others
                                        background

foreground


                                                                         X
                                                                          X
                                                                        XX
Spatial attention solves
   the “clutter problem”
 see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
 Duncan & Desimone 1995; Wolfe, 1997;
 and many others
                                        background

foreground


                                                                         X
                                                                          X
                                                                        XX
             Problem: How to know where to attend?
Spatial attention solves                                                            X
                                                                                       X
                                                                                     XX
  the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
Duncan & Desimone 1995; Wolfe, 1997;
and many others




                                                       Science 22 April 2005:
                                                 Vol. 308. no. 5721, pp. 529 - 534
                                     Parallel and Serial Neural Mechanisms for
                                        Visual Search in Macaque Area V4
                                      Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone
Spatial attention solves                                                            X
                                                                                       X
                                                                                     XX
  the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980;
Duncan & Desimone 1995; Wolfe, 1997;
and many others




                                                       Science 22 April 2005:
                                                 Vol. 308. no. 5721, pp. 529 - 534
                                     Parallel and Serial Neural Mechanisms for
                                        Visual Search in Macaque Area V4
                                      Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone




           Answer: Parallel feature-based attention
Parallel feature-based                                                              X
                                                                                     X
                                                                                   XX
attention modulation
    normalized spike activity




                                2




                                1




                                0
                                    0   100      200           0       100   200
                                              time from fixation (ms)
Serial spatial attention                                                                                                                  X
                                                                                                                                           X
                                                                                                                                         XX
modulation
       Test for serial (spatial) selection                                                     2
                                                                                                                      attend within RF




                                                                   normalized spike activity
                                                                                               1
                          FIX

                                                                                                               attend away from RF
                RF

                                                                                               0

                                                                                                   0            100             200
                         RF stimulus is
     SACCADE:
                         target of saccade
                                                  ruary 18, 2009




                                                                                                       time from fixation (ms)
         vs.
                         RF stimulus is not
     SACCADE:
                         target of saccade

Fig. 4. Illustration of the saccade enhancement
analysis. We compared neuronal measures when
the monkey made a saccade to an RF stimulus
versus a saccade away from the RF. In this dis-
Attention as Bayesian
     inference
                               PFC




                                IT




                              V4/PIT




                                V2

                                        Chikkerur Serre & Poggio in prep
see also Rao 2005; Lee & Mumford 2003
Attention as Bayesian
     inference
                               PFC

                                        feature-based
                                           attention

                                IT




                              V4/PIT




                                V2

                                                        Chikkerur Serre & Poggio in prep
see also Rao 2005; Lee & Mumford 2003
Attention as Bayesian
     inference
                               PFC

                                        feature-based
                                           attention

                                IT
             FEF/LIP


                              V4/PIT
         spatial attention



                                V2

                                                        Chikkerur Serre & Poggio in prep
see also Rao 2005; Lee & Mumford 2003
Attention as Bayesian
     inference
                                                                      O
                               PFC

                                        feature-based
                                                                             object priors
                                           attention

                                                                      Fi
                                IT
                                                          L
             FEF/LIP


                                                                      Fli
                              V4/PIT
                                                    location priors
         spatial attention
                                                                            N


                                                                       I
                                V2

                                                                      Chikkerur Serre & Poggio in prep
see also Rao 2005; Lee & Mumford 2003
Attention as Bayesian
inference
                                 PFC
                        O



               LIP
                                 IT
                        Fi
                L



                                 V4
                     Fli
                             N



                                 V2
                        I

                     Chikkerur Serre & Poggio in prep
Attention as Bayesian
     inference
                                                feature-based
                                                                                     PFC
                                                                           O
                                                   attention
     belief propagation:
                                                                 FEF/LIP
              = P (L)
 mLIP →V 4
                                                                                      IT
                                                                           Fi
              = P (F i |O)
   mIT →V 4
              =              P (Fli |F, L)P (L)P (I|Fli )
   mV 4→IT                                                          L
                  L    Fli

              =              P (Fli |F, L)P (F i |O)P (I|Fli )
 mV 4→LIP

                                                                                      V4
                                                                           Fli
                  Fi   Fli


                                                                                 N


                   Where is at object O?                                              V2
                                                                            I

                                                                           Chikkerur Serre & Poggio in prep
see also Rao 2005; Lee & Mumford 2003
Attention as Bayesian
     inference
                                              spatial attention
                                                                                           PFC
                                                                                 O

     belief propagation:
                                                                 FEF/LIP
              = P (L)
 mLIP →V 4
                                                                                           IT
                                                                                 Fi
              = P (F i |O)
   mIT →V 4
              =              P (Fli |F, L)P (L)P (I|Fli )
   mV 4→IT                                                          L
                  L    Fli

              =              P (Fli |F, L)P (F i |O)P (I|Fli )
 mV 4→LIP

                                                                                           V4
                                                                                 Fli
                  Fi   Fli


                                                                                       N


                  What is at location L?
                                                                                           V2
                                                                                  I

                                                                           Chikkerur Serre & Poggio in prep
see also Rao 2005; Lee & Mumford 2003
Model performance
improves with attention




                performance (d’)
                                                   one shift of
                                   no attention
                                                    attention

                                    Model            Humans




                                      Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model            Humans




                                          Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model            Humans




                                          Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model            Humans




                                          Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                       mask            no mask

                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model            Humans




                                          Chikkerur Serre & Poggio in prep
Agreement with
neurophysiology data
Feature-based attention:
  Differential modulation for preferred vs. non-preferred
  stimulus (Bichot et al’ 05)
Spatial attention:
  Gain modulation on neuron’s tuning curves (McAdams &
  Maunsell’99)


  Competitive mechanisms in V2 and V4 (Reynolds et al’ 99)
  Improved readout in clutter (being tested in
  collaboration with the Desimone lab)
IT readout improves with
    attention



                 train readout classifier on
      +
                       isolated object




Zhang Meyers Serre Bichot Desimone Poggio in prep
IT readout improves with
    attention



      +




Zhang Meyers Serre Bichot Desimone Poggio in prep
IT readout improves with
    attention



      +




Zhang Meyers Serre Bichot Desimone Poggio in prep
IT readout improves with
    attention



      +




Zhang Meyers Serre Bichot Desimone Poggio in prep
IT readout improves with
    attention
                                                                cue        transient change
                                                        7
                                                                            attention on object




                                         Average rank
                                                                         attention away
                                                        8
      +                                                                   from object



                                                                          object not shown
                                                        9
                                                            0   500     1000      1500    2000
                                                                      Time (ms)



                                                                                         n=34
Zhang Meyers Serre Bichot Desimone Poggio in prep
IT readout improves with
    attention
                                                                cue        transient change
                                                        7
                                                                            attention on object




                                         Average rank
                                                                         attention away
                                                        8
      +                                                                   from object



                                                                          object not shown
                                                        9
                                                            0   500     1000      1500    2000
                                                                      Time (ms)



                                                                                         n=34
Zhang Meyers Serre Bichot Desimone Poggio in prep
Could these attentional
mechanisms also explain
search strategies in
complex natural images?
Matching human eye
movements
 Dataset:
   100 street-scenes images with cars &
   pedestrians and 20 without

 Experiment
   8 participants asked to count the number of
   cars/pedestrians
   Blocks/randomized presentations
   Each image presented twice

 Eye movements recorded using
 an infra-red eye tracker
 Eye movements as proxy for
 attention
                                                 Chikkerur Tan Serre & Poggio in sub
Matching human eye
movements

                 Car search
                 Pedestrian search




                 Chikkerur Tan Serre & Poggio in sub
Matching human eye
movements

                 Car search
                 Pedestrian search




                 Chikkerur Tan Serre & Poggio in sub
Attention as Bayesian
inference
                                  PFC
                        O



              FEF/LIP
                                   IT
                        Fi
                 L



                                   V4
                        Fli
                              N



                                   V2
                         I

                        Chikkerur Serre & Poggio in prep
Matching human eye
movements
Matching human eye                   100%



movements




                 fraction fixations
                                     75%

                                     50%


                                     25%



                                            10%   20%   30%
                   % image covered by saliency maps
Matching human eye                   100%

                                              area
movements




                 fraction fixations
                                     75%
                                             under
                                     50%

                                             ROC
                                     25%

                                             curve
                                            10%   20%   30%
                   % image covered by saliency maps
Results
   ROC area




     Humans   Bottom-up   Top-down (feature-based)


                                  Chikkerur Tan Serre & Poggio in sub
Results
                1
   ROC area

              0.75


               0.5


              0.25


                0
                       car       pedestrian

     Humans          Bottom-up    Top-down (feature-based)


                                              Chikkerur Tan Serre & Poggio in sub
Results
                1
   ROC area

              0.75


               0.5


              0.25


                0
                       car       pedestrian

     Humans          Bottom-up    Top-down (feature-based)


                                              Chikkerur Tan Serre & Poggio in sub
Results
                1
   ROC area

              0.75


               0.5


              0.25


                0
                       car       pedestrian

     Humans          Bottom-up    Top-down (feature-based)


                                              Chikkerur Tan Serre & Poggio in sub
Results
                1
   ROC area

              0.75


               0.5


              0.25


                0
                       car       pedestrian

     Humans          Bottom-up    Top-down (feature-based)


                                              Chikkerur Tan Serre & Poggio in sub
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception
Mechanisms of bottom-up and top-down processing in visual perception

More Related Content

What's hot

The Clinical Interview
The Clinical InterviewThe Clinical Interview
The Clinical InterviewMingMing Davis
 
Neuropsychology compiled report
Neuropsychology compiled reportNeuropsychology compiled report
Neuropsychology compiled reportMonica Policarpio
 
Cognitive Neuroscience
Cognitive NeuroscienceCognitive Neuroscience
Cognitive NeuroscienceIIT Roorkee
 
Ethical issues in psychology
Ethical issues in psychologyEthical issues in psychology
Ethical issues in psychologyRustamAli44
 
Attention
Attention Attention
Attention gsjus
 
Theory of Mind - Seminar presentation
Theory of Mind - Seminar presentation Theory of Mind - Seminar presentation
Theory of Mind - Seminar presentation Ashutosh Ratnam
 
Attention in cognitive Psychology
Attention in cognitive PsychologyAttention in cognitive Psychology
Attention in cognitive PsychologySumiran Khatri
 
Research Methods in Psychology
Research Methods in PsychologyResearch Methods in Psychology
Research Methods in PsychologyJames Neill
 
Independent group design
Independent group designIndependent group design
Independent group designshayaniqbal7
 
Pattern Recognition: A cognitive process
Pattern Recognition: A cognitive processPattern Recognition: A cognitive process
Pattern Recognition: A cognitive processMuna Shrestha
 
an introduction to neuropsychology
an introduction to neuropsychologyan introduction to neuropsychology
an introduction to neuropsychologywisha asma
 
Nature of attention (Meaning, Definition and Theories in brief)
Nature of attention (Meaning, Definition and Theories in brief)Nature of attention (Meaning, Definition and Theories in brief)
Nature of attention (Meaning, Definition and Theories in brief)Dr Rajesh Verma
 
History Of Cognitive Psychology
History Of Cognitive PsychologyHistory Of Cognitive Psychology
History Of Cognitive PsychologyAli Hasan
 
Introduction to psychophysics (English)
Introduction to psychophysics (English)Introduction to psychophysics (English)
Introduction to psychophysics (English)Dr Rajesh Verma
 
The Coginitive Process of Attention
The Coginitive Process of AttentionThe Coginitive Process of Attention
The Coginitive Process of AttentionFarzan Sheikh
 

What's hot (20)

The Clinical Interview
The Clinical InterviewThe Clinical Interview
The Clinical Interview
 
Neuropsychology compiled report
Neuropsychology compiled reportNeuropsychology compiled report
Neuropsychology compiled report
 
Cognitive Neuroscience
Cognitive NeuroscienceCognitive Neuroscience
Cognitive Neuroscience
 
Ethical issues in psychology
Ethical issues in psychologyEthical issues in psychology
Ethical issues in psychology
 
Attention
Attention Attention
Attention
 
Theory of Mind - Seminar presentation
Theory of Mind - Seminar presentation Theory of Mind - Seminar presentation
Theory of Mind - Seminar presentation
 
Attention in cognitive Psychology
Attention in cognitive PsychologyAttention in cognitive Psychology
Attention in cognitive Psychology
 
Research Methods in Psychology
Research Methods in PsychologyResearch Methods in Psychology
Research Methods in Psychology
 
Depth cues
Depth cuesDepth cues
Depth cues
 
Independent group design
Independent group designIndependent group design
Independent group design
 
Pattern Recognition: A cognitive process
Pattern Recognition: A cognitive processPattern Recognition: A cognitive process
Pattern Recognition: A cognitive process
 
Psychodynamic theory
Psychodynamic theoryPsychodynamic theory
Psychodynamic theory
 
Clinical Interview
Clinical InterviewClinical Interview
Clinical Interview
 
Psychology of Peace
 Psychology of  Peace Psychology of  Peace
Psychology of Peace
 
an introduction to neuropsychology
an introduction to neuropsychologyan introduction to neuropsychology
an introduction to neuropsychology
 
Nature of attention (Meaning, Definition and Theories in brief)
Nature of attention (Meaning, Definition and Theories in brief)Nature of attention (Meaning, Definition and Theories in brief)
Nature of attention (Meaning, Definition and Theories in brief)
 
History Of Cognitive Psychology
History Of Cognitive PsychologyHistory Of Cognitive Psychology
History Of Cognitive Psychology
 
Introduction to psychophysics (English)
Introduction to psychophysics (English)Introduction to psychophysics (English)
Introduction to psychophysics (English)
 
The Coginitive Process of Attention
The Coginitive Process of AttentionThe Coginitive Process of Attention
The Coginitive Process of Attention
 
Perception Of Time
Perception Of TimePerception Of Time
Perception Of Time
 

Viewers also liked

Visual perception 1
Visual perception 1Visual perception 1
Visual perception 1cece2012
 
Psychological processes: Bottom-up and Top-Down Listening Schemata
Psychological processes: Bottom-up and Top-Down Listening SchemataPsychological processes: Bottom-up and Top-Down Listening Schemata
Psychological processes: Bottom-up and Top-Down Listening SchemataJC Mark Gumban
 
Top Down and Bottom Up Design Model
Top Down and Bottom Up Design ModelTop Down and Bottom Up Design Model
Top Down and Bottom Up Design ModelAbdul Rahman Sherzad
 
Brand Sense And Sensitive 2013 Brand In Trend
Brand Sense And Sensitive 2013 Brand In TrendBrand Sense And Sensitive 2013 Brand In Trend
Brand Sense And Sensitive 2013 Brand In TrendDimitar Trendafilov, PhD
 
Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...Valentine Charles
 
Questioning Strategies
Questioning StrategiesQuestioning Strategies
Questioning Strategiesguest0f30ee6
 
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Jonathon Hare
 
Prior and background knowledge in reading
Prior and background knowledge in readingPrior and background knowledge in reading
Prior and background knowledge in readingDebbie Lahav
 

Viewers also liked (20)

Top down process
Top down processTop down process
Top down process
 
Visual perception 1
Visual perception 1Visual perception 1
Visual perception 1
 
Bottom up & top down tutorial 2
Bottom up & top down tutorial 2Bottom up & top down tutorial 2
Bottom up & top down tutorial 2
 
Psychological processes: Bottom-up and Top-Down Listening Schemata
Psychological processes: Bottom-up and Top-Down Listening SchemataPsychological processes: Bottom-up and Top-Down Listening Schemata
Psychological processes: Bottom-up and Top-Down Listening Schemata
 
Top down-bottom-up
Top down-bottom-upTop down-bottom-up
Top down-bottom-up
 
Top Down and Bottom Up Design Model
Top Down and Bottom Up Design ModelTop Down and Bottom Up Design Model
Top Down and Bottom Up Design Model
 
3 game 1 (persepsi)
3 game 1 (persepsi)3 game 1 (persepsi)
3 game 1 (persepsi)
 
Brand Sense And Sensitive 2013 Brand In Trend
Brand Sense And Sensitive 2013 Brand In TrendBrand Sense And Sensitive 2013 Brand In Trend
Brand Sense And Sensitive 2013 Brand In Trend
 
Lecture05
Lecture05Lecture05
Lecture05
 
Lecture01
Lecture01Lecture01
Lecture01
 
Lecture06
Lecture06Lecture06
Lecture06
 
Perception
PerceptionPerception
Perception
 
Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...
 
indera pengecap
indera pengecapindera pengecap
indera pengecap
 
Anfis sistem sensori
Anfis sistem sensoriAnfis sistem sensori
Anfis sistem sensori
 
Questioning Strategies
Questioning StrategiesQuestioning Strategies
Questioning Strategies
 
Literacy
LiteracyLiteracy
Literacy
 
2015.12.17 kg bim
2015.12.17 kg bim2015.12.17 kg bim
2015.12.17 kg bim
 
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
 
Prior and background knowledge in reading
Prior and background knowledge in readingPrior and background knowledge in reading
Prior and background knowledge in reading
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 

Mechanisms of bottom-up and top-down processing in visual perception

  • 1. Mechanisms of bottom-up and top-down processing in visual perception Thomas Serre McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology
  • 3. Rapid recognition: human behavior Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  • 4. Rapid recognition: human behavior Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  • 5. Rapid recognition: human behavior Gist of the scene at 7 images/s from unpredictable random sequence of images No time for eye movements No top-down / expectations Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  • 6. Rapid recognition: human behavior Gist of the scene at 7 images/s from unpredictable random sequence of images No time for eye movements No top-down / expectations Feedforward processing: Coarse / base image representation Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  • 7. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem”
  • 8. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem”
  • 9. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem”
  • 10. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem” 2.Beyond feedforward processing: X X Top-down cortical feedback and attention to solve the “clutter problem” XX Predicting human eye movements
  • 11. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem” 2.Beyond feedforward processing: Top-down cortical feedback and attention to solve the “clutter problem” Predicting human eye movements
  • 12. Object recognition in the visual cortex source: Jim DiCarlo
  • 13. Object recognition in the visual cortex Ventral visual stream source: Jim DiCarlo
  • 14. Object recognition in the visual cortex Hierarchical architecture: Ventral visual stream source: Jim DiCarlo
  • 15. Object recognition in the visual cortex Hierarchical architecture: Latencies Ventral visual stream source: Jim DiCarlo
  • 16. Object recognition in the visual cortex Hierarchical architecture: Latencies Ventral visual stream Anatomy source: Jim DiCarlo
  • 17. Object recognition in the visual cortex Hierarchical architecture: Latencies Ventral visual stream Anatomy Function source: Jim DiCarlo
  • 18. Object recognition in the visual cortex Nobel prize 1981 Hubel & Wiesel 1959, 1962, 1965, 1968
  • 19. Object recognition in the visual cortex gradual increase in complexity of preferred stimulus Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
  • 20. Object recognition in the visual cortex Parallel increase in invariance properties (position and scale) of neurons Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
  • 21. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE 2 o S4 7 10 STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning AIT o 10 3 7 C2b Unsupervised o o 10 4 1.2 - 3.2 S3 PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 10 7 0.9 - 4.4 S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  • 22. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 Large-scale (108 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST units), spans LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT several areas of TE 2 o S4 7 10 the visual cortex STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning AIT o 10 3 7 C2b Unsupervised o o 10 4 1.2 - 3.2 S3 PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 10 7 0.9 - 4.4 S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  • 23. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 Large-scale (108 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST units), spans LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT several areas of TE 2 o S4 7 10 the visual cortex STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning Combination of AIT o 3 7 10 C2b Unsupervised forward 10 and o o 4 1.2 - 3.2 S3 reverse PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 7 0.9 - 4.4 10 engineering S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  • 24. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 Large-scale (108 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST units), spans LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT several areas of TE 2 o S4 7 10 the visual cortex STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning Combination of AIT o 3 7 10 C2b Unsupervised forward 10 and o o 4 1.2 - 3.2 S3 reverse PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 7 0.9 - 4.4 10 engineering S2b o o 10 5 1.1 - 3.0 C2 Shown to be o o 7 0.6 - 2.4 10 V4 PO V3A MT S2 consistent with o o 4 0.4 - 1.6 10 V3 C1 V2 many1.1 10 experimental o o 6 0.2 - V1 data across areas S1 of visual cortex dorsal stream ventral stream 'where' pathway 'what' pathway (V1, V2, V4, MT and IT) Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  • 25. Two functional classes of cells Simple cells Complex cells Invariance Template matching max-like operation Gaussian-like tuning ~”OR” ~ “AND” Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)
  • 26. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE 2 o S4 7 10 STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning AIT o 10 3 7 C2b Unsupervised o o 10 4 1.2 - 3.2 S3 PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 10 7 0.9 - 4.4 S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  • 27. Hierarchy of image fragments see also Ullman et al 2002
  • 28. Hierarchy of image fragments Unsupervised learning of frequent image fragments during development see also Ullman et al 2002
  • 29. Hierarchy of image fragments Unsupervised learning of frequent image fragments during development Reusable fragments shared across categories see also Ullman et al 2002
  • 30. Hierarchy of image fragments Unsupervised learning of frequent image fragments during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry see also Ullman et al 2002
  • 31. Hierarchy of image fragments Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  • 32. Hierarchy of image fragments Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  • 33. Hierarchy of image fragments Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  • 34. Hierarchy of image fragments category selective units linear perceptron Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  • 35. Model vs. IT 1 IT Model 0.8 Classification performance 0.6 0.4 0.2 0 Size: 3.4o 3.4o 1.7o 6.8o 3.4o 3.4o center 2ohorz. 4ohorz. Position: center center center TRAIN Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
  • 36. Is this model sufficient to explain performance in rapid categorization tasks? Image Interval Image-Mask Mask 1/f noise 20 ms 30 ms ISI 80 ms Animal present or not ? Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005
  • 38. Rapid categorization Head Close-body Medium-body Far-body Animals Natural distractors Artificial distractors Serre Oliva & Poggio 2007
  • 40. Rapid categorization Head Close-body Medium-body Far-body Animals Natural distractors Serre Oliva & Poggio 2007
  • 41. Rapid categorization 2.6 2.4 Performance (d') 1.8 1.4 Model (82% correct) Human observers (80% correct) 1.0 Head Close-body Medium-body Far-body Head Close- Medium- Far- body body body Animals Natural distractors Serre Oliva & Poggio 2007
  • 42. “Clutter effect” Limitation of feedforward model compatible with reduced selectivity in V4 (Reynolds et al 1999) and IT in the presence of clutter (Zoccolan et al 2005, 2007; Rolls et al 2003) Meyers Freiwald Embark Kreiman Serre Poggio in prep
  • 43. “Clutter effect” Recording site in monkey’s IT Limitation of feedforward model compatible with reduced selectivity in V4 Model (Reynolds et al 1999) and IT in the presence of clutter IT neurons (Zoccolan et al 2005, 2007; Rolls et al 2003) fMRI Meyers Freiwald Embark Kreiman Serre Poggio in prep
  • 44. Summary I Rapid categorization seems compatible with model based on feedforward hierarchy of image fragments Consistent with psychophysics, key limitation of architecture is recognition in clutter How does the visual system overcome such limitation?
  • 45. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem” 2.Beyond feedforward processing: X X Top-down cortical feedback and attention to solve the “clutter problem” XX Predicting human eye movements
  • 46. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others
  • 47. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others foreground
  • 48. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others background foreground
  • 49. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others background foreground X X XX
  • 50. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others background foreground X X XX Problem: How to know where to attend?
  • 51. Spatial attention solves X X XX the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others Science 22 April 2005: Vol. 308. no. 5721, pp. 529 - 534 Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone
  • 52. Spatial attention solves X X XX the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others Science 22 April 2005: Vol. 308. no. 5721, pp. 529 - 534 Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone Answer: Parallel feature-based attention
  • 53. Parallel feature-based X X XX attention modulation normalized spike activity 2 1 0 0 100 200 0 100 200 time from fixation (ms)
  • 54. Serial spatial attention X X XX modulation Test for serial (spatial) selection 2 attend within RF normalized spike activity 1 FIX attend away from RF RF 0 0 100 200 RF stimulus is SACCADE: target of saccade ruary 18, 2009 time from fixation (ms) vs. RF stimulus is not SACCADE: target of saccade Fig. 4. Illustration of the saccade enhancement analysis. We compared neuronal measures when the monkey made a saccade to an RF stimulus versus a saccade away from the RF. In this dis-
  • 55. Attention as Bayesian inference PFC IT V4/PIT V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  • 56. Attention as Bayesian inference PFC feature-based attention IT V4/PIT V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  • 57. Attention as Bayesian inference PFC feature-based attention IT FEF/LIP V4/PIT spatial attention V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  • 58. Attention as Bayesian inference O PFC feature-based object priors attention Fi IT L FEF/LIP Fli V4/PIT location priors spatial attention N I V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  • 59. Attention as Bayesian inference PFC O LIP IT Fi L V4 Fli N V2 I Chikkerur Serre & Poggio in prep
  • 60. Attention as Bayesian inference feature-based PFC O attention belief propagation: FEF/LIP = P (L) mLIP →V 4 IT Fi = P (F i |O) mIT →V 4 = P (Fli |F, L)P (L)P (I|Fli ) mV 4→IT L L Fli = P (Fli |F, L)P (F i |O)P (I|Fli ) mV 4→LIP V4 Fli Fi Fli N Where is at object O? V2 I Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  • 61. Attention as Bayesian inference spatial attention PFC O belief propagation: FEF/LIP = P (L) mLIP →V 4 IT Fi = P (F i |O) mIT →V 4 = P (Fli |F, L)P (L)P (I|Fli ) mV 4→IT L L Fli = P (Fli |F, L)P (F i |O)P (I|Fli ) mV 4→LIP V4 Fli Fi Fli N What is at location L? V2 I Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  • 62. Model performance improves with attention performance (d’) one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 63. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 64. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 65. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 66. Model performance improves with attention mask no mask 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 67. Agreement with neurophysiology data Feature-based attention: Differential modulation for preferred vs. non-preferred stimulus (Bichot et al’ 05) Spatial attention: Gain modulation on neuron’s tuning curves (McAdams & Maunsell’99) Competitive mechanisms in V2 and V4 (Reynolds et al’ 99) Improved readout in clutter (being tested in collaboration with the Desimone lab)
  • 68. IT readout improves with attention train readout classifier on + isolated object Zhang Meyers Serre Bichot Desimone Poggio in prep
  • 69. IT readout improves with attention + Zhang Meyers Serre Bichot Desimone Poggio in prep
  • 70. IT readout improves with attention + Zhang Meyers Serre Bichot Desimone Poggio in prep
  • 71. IT readout improves with attention + Zhang Meyers Serre Bichot Desimone Poggio in prep
  • 72. IT readout improves with attention cue transient change 7 attention on object Average rank attention away 8 + from object object not shown 9 0 500 1000 1500 2000 Time (ms) n=34 Zhang Meyers Serre Bichot Desimone Poggio in prep
  • 73. IT readout improves with attention cue transient change 7 attention on object Average rank attention away 8 + from object object not shown 9 0 500 1000 1500 2000 Time (ms) n=34 Zhang Meyers Serre Bichot Desimone Poggio in prep
  • 74. Could these attentional mechanisms also explain search strategies in complex natural images?
  • 75. Matching human eye movements Dataset: 100 street-scenes images with cars & pedestrians and 20 without Experiment 8 participants asked to count the number of cars/pedestrians Blocks/randomized presentations Each image presented twice Eye movements recorded using an infra-red eye tracker Eye movements as proxy for attention Chikkerur Tan Serre & Poggio in sub
  • 76. Matching human eye movements Car search Pedestrian search Chikkerur Tan Serre & Poggio in sub
  • 77. Matching human eye movements Car search Pedestrian search Chikkerur Tan Serre & Poggio in sub
  • 78. Attention as Bayesian inference PFC O FEF/LIP IT Fi L V4 Fli N V2 I Chikkerur Serre & Poggio in prep
  • 80. Matching human eye 100% movements fraction fixations 75% 50% 25% 10% 20% 30% % image covered by saliency maps
  • 81. Matching human eye 100% area movements fraction fixations 75% under 50% ROC 25% curve 10% 20% 30% % image covered by saliency maps
  • 82. Results ROC area Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  • 83. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  • 84. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  • 85. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  • 86. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub

Editor's Notes

  1. Thank you very much Charles for inviting me. I am delighted to be here and enjoying a weather that we could never hope for in the Spring in Boston...
  2. Here is the problem I am trying to solve: You give me an image and I tell you for instance whether or not it contains an animal. Object recognition is a very hard computational problem. The reason for that is that despite the fact that all of these are images of a giraffe, they look quite different at the pixel level. Objects in the real-world and these animal images in particular can vary drastically in terms of their appearance, shape, texture. In particular, changes in position and scale can create very large changes in the pattern of activity that they elicit on the retina... Think about that: even just a small shift in position of 2 deg of visual angle corresponds to shifting of the image on the retina of more than 120 photoreceptors! This is an extremely difficult task and today, no artificial computer vision system can do this task as robustly and accurately as the primate visual system. However as primates we are extremely good at solving this task despite all these variations...
  3. A classical paradigm that has been extensively used to study object recognition and visual perception is what I would call the rapid recognition paradigms. Here I am flashing images in rapid succession. This paradigm is called RSVP and was introduced by Molly Potter in the 70’s. Images are being presented at a rate of 7/s. At this speed you probably don’t get every details in the image but at the very least you are able to build a coarse description of the scene. For instance most of you should be able to recognize and perhaps memorize objects in these images... While these types of task do not necessarily reflect natural everyday vision when the visual world moves continuously and you are free to move your eyes and shift your attention. However they are able to isolate the first 100-150 ms of visual processing during which a base representation for images is being formed before more complex visual routines can come into play...
  4. A classical paradigm that has been extensively used to study object recognition and visual perception is what I would call the rapid recognition paradigms. Here I am flashing images in rapid succession. This paradigm is called RSVP and was introduced by Molly Potter in the 70’s. Images are being presented at a rate of 7/s. At this speed you probably don’t get every details in the image but at the very least you are able to build a coarse description of the scene. For instance most of you should be able to recognize and perhaps memorize objects in these images... While these types of task do not necessarily reflect natural everyday vision when the visual world moves continuously and you are free to move your eyes and shift your attention. However they are able to isolate the first 100-150 ms of visual processing during which a base representation for images is being formed before more complex visual routines can come into play...
  5. A classical paradigm that has been extensively used to study object recognition and visual perception is what I would call the rapid recognition paradigms. Here I am flashing images in rapid succession. This paradigm is called RSVP and was introduced by Molly Potter in the 70’s. Images are being presented at a rate of 7/s. At this speed you probably don’t get every details in the image but at the very least you are able to build a coarse description of the scene. For instance most of you should be able to recognize and perhaps memorize objects in these images... While these types of task do not necessarily reflect natural everyday vision when the visual world moves continuously and you are free to move your eyes and shift your attention. However they are able to isolate the first 100-150 ms of visual processing during which a base representation for images is being formed before more complex visual routines can come into play...
  6. In this talk I will argue that this base representation corresponds to the activation of a hierarchy of image fragments following a single feedforward sweep through the visual system. This bottom-up feedforward sweep rapidly activates specific sub-population of neurons in the ventral stream of the visual cortex that are tuned to image fragments with different levels of selectivity and invariance. I will show you that consistent with human psychophysics, a key limitation of this architecture is that it is susceptible to clutter. While it does relatively well on images that contains a single object and limited clutter (like the ones I just showed you), we found that the performance decreases significantly with increased amount of clutter.
  7. In this talk I will argue that this base representation corresponds to the activation of a hierarchy of image fragments following a single feedforward sweep through the visual system. This bottom-up feedforward sweep rapidly activates specific sub-population of neurons in the ventral stream of the visual cortex that are tuned to image fragments with different levels of selectivity and invariance. I will show you that consistent with human psychophysics, a key limitation of this architecture is that it is susceptible to clutter. While it does relatively well on images that contains a single object and limited clutter (like the ones I just showed you), we found that the performance decreases significantly with increased amount of clutter.
  8. In the second part of my talk I will argue that the way the visual system solves this clutter problem is via cortical feedback and shifts of attention. I will outline an integrated model of object recognition and attention. I will show that the object recognition performance of the model increases significantly when used in conjunction with attentional mechanisms. Using eye movements as a proxy for attention, I will show that the resulting model can account for a significant fraction of human eye movements during search tasks in complex natural images.
  9. We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
  10. We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
  11. We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
  12. Computational considerations suggest that you need two types of operations and therefore functional classes of cells to explain those data. By analogy with H&B hierarchical model of processing in the visual cortex, we have called these two classes of cells simple and complex. The scheme that I am going to describe essentially extend their proposal from striate to extra-striate visual areas. We have assumed that these two types of functional units implement two types of computations or mathematical operations: Gaussian-like or bell-shape tuning and a max-like operation. The gaussian-bell tuning was motivated by a learning algorithm called Radial Basis Function while the max operation was motivated by the standard scanning approach in computer vision and theoretical arguments from signal processing. The goal of the simple units is to increase the complexity of the representation. Here on this example by pooling together the activity of afferent units with different orientations via this Gaussian-like tuning. This Gaussian tuning is ubiquitous in the visual cortex from orientation tuning in V1 to tuning for complex objects around certain poses in IT. The complex units pool together afferent units with the same preferred stimuli eg vertical bar but slightly different positions and scales. At the complex unit level we thus build some tolerance with respect to the exact position and scale of the stimulus within the receptive field of the unit.
  13. We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
  14. EMPHASIZE AFTER TRAINING: NO DATA FITTING MENTION CHARLES It builds a simple-to-complex cells hierarchies. Mimic as closely as possible the tuning properties of neurons in various areas of the ventral stream Builds on earlier work in the lab by Max Riesenhuber
  15. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  16. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  17. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  18. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  19. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  20. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  21. -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
  22. for the sake of time I am only going to show you that you can simulate a neurophysiology experiment with this model. You can record from population of random neurons and perform the same exact analysis as in a real experiment. On the bar plot shown here we performed the same exact readout experiment as in the study by Hung et al. What is shown here the classification performance when training in a specific position and scale and evaluating the generalization capability of the classifier to positions and scales not presented during training. This measures the built-in invariance inherited from the response properties of population of neurons and you can see that the fit is quite good.
  23. In parallel we have used this model in real-world computer vision applications. For instance we have developed a computer vision system for the automatic parsing of street scene images. Here are examples of automatic parsing by the system overlaid over the original images. The colors and bounding boxes indicate predictions from the model (eg green for trees etc). The computer vision system shown here is based exclusively on the response properties
  24. More recently we have extended the approach for the recognition of human actions such as running, walking, jogging, jumping, waving etc... In all cases we have shown that the resulting biologically motivated computer vision systems were performing on par or better than state-of-the-art computer vision systems.
  25. The goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention. Here is an example on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
  26. The goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention. Here is an example on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
  27. We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
  28. We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
  29. We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
  30. We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
  31. We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
  32. Using eye movements as correlate of attention. Assumption is that attention gets to an item just before eye moves so if eyes move we an assume that just before that attention was there
  33. Here is the original model: we added back-projections to account for these attentional modulations we assume that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. We also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence). This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
  34. Here is the original model: we added back-projections to account for these attentional modulations we assume that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. We also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence). This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
  35. Here is the original model: we added back-projections to account for these attentional modulations we assume that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. We also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence). This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
  36. here the way we implemented that is via belief propagation in polytrees (here the messages are shown for the simplified case of a single feature for clarity). Within framework, spatial attention can be described as a series of msgs from L to Fil to Fi to O while feature-based attention goes the opposite way. Thus the model makes specific predictions about the timing of visual areas in the ventral stream and the parietal cortex depending on the task at end. Obviously I am leaving a lot of details open unfortunately...
  37. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  38. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  39. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  40. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  41. here the way we implemented that is via belief propagation in polytrees (here the messages are shown for the simplified case of a single feature for clarity). Within framework, spatial attention can be described as a series of msgs from L to Fil to Fi to O while feature-based attention goes the opposite way. Thus the model makes specific predictions about the timing of visual areas in the ventral stream and the parietal cortex depending on the task at end. Obviously I am leaving a lot of details open unfortunately...
  42. Unlike artificial search arrays were arbitrary objects are simply randomly placed on a display, natural scenes are highly structured. This is a point that has been made by Antonio Torralba and Aude Oliva and the fact that global features could provide a good representation of the gist of the scene which is sufficient to associate contextual information from the visual scene to actual object locations like here for instance where you would expect people to be most in these darker regions...