SlideShare uma empresa Scribd logo
1 de 51
Modeling Mutual Context of Object
and Human Pose in Human-Object
      Interaction Activities
         Bangpeng Yao and Li Fei-Fei
Computer Science Department, Stanford University
   {bangpeng,feifeili}@cs.stanford.edu




                                                   1
Human-Object Interaction




Robots interact      Automatic sports           Medical care
 with objects          commentary
                  “Kobe is dunking the ball.”

                                                               2
Human-Object Interaction
     Holistic image based classification (Previous talk: Grouplet)
                                                                     Playing
                                                                                 Playing
                                                                    bassoon
                                                                               saxophone
     Detailed understanding and reasoning
                    Vs.
                                                              Playing
                                                            saxophone




     Grouplet is a generic feature for structured objects, or interactions
     of groups of objects.




                           HOI activity: Tennis Forehand
             Berg & Malik, 2005   Grauman & Darrell, 2005    Gehler & Nowozin, 2009        OURS
Caltech101
                    48%                    59%                                 77%         62%    3
Human-Object Interaction
Holistic image based classification

Detailed understanding and reasoning
  • Human pose estimation



                                Head

                              Torso




                                        4
Human-Object Interaction
Holistic image based classification

Detailed understanding and reasoning
  • Human pose estimation
  • Object detection



               Tennis
               racket




                                        5
Human-Object Interaction
Holistic image based classification

Detailed understanding and reasoning
  • Human pose estimation
  • Object detection

                                Head

               Tennis         Torso
               racket




                 HOI activity: Tennis Forehand

                                                 6
Outline
 • Background and Intuition
 • Mutual Context of Object and Human Pose
    Model Representation
    Model Learning
    Model Inference
 • Experiments
 • Conclusion



                                             7
Outline
 • Background and Intuition
 • Mutual Context of Object and Human Pose
    Model Representation
    Model Learning
    Model Inference
 • Experiments
 • Conclusion



                                             8
Human pose estimation & Object detection


Human pose                                              Difficult part
estimation is                                           appearance
challenging.

                                                        Self-occlusion




                                                        Image region looks
                                                          like a body part

                •   Felzenszwalb & Huttenlocher, 2005
                •   Ren et al, 2005
                •   Ramanan, 2006
                •   Ferrari et al, 2008
                •   Yang & Mori, 2008
                •   Andriluka et al, 2009
                                                                             9
                •   Eichner & Ferrari, 2009
Human pose estimation & Object detection


Human pose
estimation is
challenging.




                •   Felzenszwalb & Huttenlocher, 2005
                •   Ren et al, 2005
                •   Ramanan, 2006
                •   Ferrari et al, 2008
                •   Yang & Mori, 2008
                •   Andriluka et al, 2009
                                                        10
                •   Eichner & Ferrari, 2009
Human pose estimation & Object detection

                  Facilitate


Given the
object is
detected.




                                             11
Human pose estimation & Object detection


                                                            Object
                                                            detection is
              Small, low-                                   challenging
          resolution, partially
               occluded




          Image region similar
           to detection target


                                  •   Viola & Jones, 2001
                                  •   Lampert et al, 2008
                                  •   Divvala et al, 2009
                                  •   Vedaldi et al, 2009
                                                                    12
Human pose estimation & Object detection


                                                  Object
                                                  detection is
                                                  challenging




                        •   Viola & Jones, 2001
                        •   Lampert et al, 2008
                        •   Divvala et al, 2009
                        •   Vedaldi et al, 2009
                                                          13
Human pose estimation & Object detection

                Facilitate

                                    Given the
                                    pose is
                                    estimated.




                                            14
Human pose estimation & Object detection

             Mutual Context




                                           15
Context in Computer Vision
     Previous work – Use context
     cues to facilitate object detection:




      Helpful, but only moderately
      outperform better
                                     ~3-4%




              with       without
             context     context
•   Hoiem et al, 2006        •   Murphy et al, 2003           • Viola & Jones, 2001
•   Rabinovich et al, 2007   •   Shotton et al, 2006          • Lampert et al, 2008
•                            •
•
    Oliva & Torralba, 2007
    Heitz & Koller, 2008     •
                                 Harzallah et al, 2009
                                 Li, Socher & Fei-Fei, 2009   
•   Desai et al, 2009        •   Marszalek et al, 2009
•     
    Divvala et al, 2009      •   Bao & Savarese, 2010                                 16
Context in Computer Vision
     Previous work – Use context                              Our approach – Two challenging
     cues to facilitate object detection:                     tasks serve as mutual context of
                                                              each other:
                                                          With
                                                          mutual
                                                          context:


      Helpful, but only moderately
      outperform better
                                     ~3-4%


                                                          Without
                                                          context:
              with       without
             context     context
•   Hoiem et al, 2006        •   Murphy et al, 2003
•   Rabinovich et al, 2007   •   Shotton et al, 2006
•   Oliva & Torralba, 2007   •   Harzallah et al, 2009
•   Heitz & Koller, 2008     •   Li, Socher & Fei-Fei, 2009
•   Desai et al, 2009        •   Marszalek et al, 2009
•   Divvala et al, 2009      •   Bao & Savarese, 2010                                            17
Outline
 • Background and Intuition
 • Mutual Context of Object and Human Pose
    Model Representation
    Model Learning
    Model Inference
 • Experiments
 • Conclusion



                                             18
Mutual Context Model Representation
A:
                                                     Activity
                                                                 A

      Tennis Croquet Volleyball                                        Human pose
     forehand shot    smash
                                                                        H
O:                                                Object
                                                     O
      Tennis Croquet Volleyball                                                 Body parts
      racket mallet
                                                                 P1   P2       PN
H:
                                                       fO
                                                                 f1    f2      fN
         Intra-class variations
          • More than one H for each A;               Image evidence
          • Unobserved during training.

P:    lP: location; θP: orientation; sP: scale.

f:    Shape context. [Belongie et al, 2002]                                            19
Mutual Context Model Representation

                                                                             Markov Random Field
•  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency              A                 we e
of co-occurrence between A, O, and H.                                                   eE
                                                                      e ( A, H )
                                                           e ( A, O )                 Clique Clique
                                                                                H      weight potential
                                                                e (O, H )
                                                          O

                                                                    P1        P2             PN


                                                          fO
                                                                     f1        f2            fN




                                                                                                   20
Mutual Context Model Representation

                                                                                    Markov Random Field
•  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency                       A                    we e
of co-occurrence between A, O, and H.                                                                   eE


•  e (O, Pn ) , e ( H , Pn ) , e ( Pm , Pn ) : Spatial                                             Clique Clique
                                                                                         H            weight potential
relationship among object and body parts.
                         
 bin lO  lPn  bin O   Pn   sO sPn                     O
                                                                                     e ( H , Pn )
      location          orientation           size                    e (O, Pn )
                                                                             P1        P2                     PN
                                                                                              e ( Pm , Pn )
                                                                fO
                                                                              f1        f2                    fN




                                                                                                                    21
Mutual Context Model Representation

                                                                                    Markov Random Field
•  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency                       A                    we e
of co-occurrence between A, O, and H.                                                                   eE


•  e (O, Pn ) , e ( H , Pn ) , e ( Pm , Pn ) : Spatial                                             Clique Clique
                                                                                         H            weight potential
relationship among object and body parts.
                         
 bin lO  lPn  bin O   Pn   sO sPn                     O
                                                                                     e ( H , Pn )
      location          orientation           size                    e (O, Pn )                 Obtained by
                                                                                               structure learning
• Learn structural connectivity among                                        P1        P2                     PN
the body parts and the object.                                                                e ( Pm , Pn )
                                                                fO
                                                                              f1        f2                    fN




                                                                                                                    22
Mutual Context Model Representation

                                                                                        Markov Random Field
•  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency                          A                    we e
of co-occurrence between A, O, and H.                                                                      eE


•  e (O, Pn ) , e ( H , Pn ) , e ( Pm , Pn ) : Spatial                                                Clique Clique
                                                                                              H          weight potential
relationship among object and body parts.
                         
 bin lO  lPn  bin O   Pn   sO sPn                          O
      location          orientation           size

• Learn structural connectivity among                            e (O , f O )   P1         P2                  PN
the body parts and the object.
                                                                                       e ( Pn , f P )
                                                                                                   n
                                                                      fO
•  e (O, f O ) and  e ( Pn , f Pn ): Discriminative
part detection scores.                                                           f1          f2                 fN
     Shape context + AdaBoost
                            [Andriluka et al, 2009]
                            [Belongie et al, 2002]
                            [Viola & Jones, 2001]


                                                                                                                      23
Outline
 • Background and Intuition
 • Mutual Context of Object and Human Pose
    Model Representation
    Model Learning
    Model Inference
 • Experiments
 • Conclusion



                                             24
Model Learning
                                     Input:
   we e
               A

    eE
                     H
                                                           
          O
                                       cricket   cricket
               P1   P2       PN        shot     bowling


          fO
               f1   f2       fN

Goals:
 Hidden human poses




                                                               25
Model Learning
                                     Input:
   we e
               A

    eE
                     H
                                                           
          O
                                       cricket   cricket
               P1   P2       PN        shot     bowling


          fO
               f1   f2       fN

Goals:
 Hidden human poses
 Structural connectivity




                                                               26
Model Learning
                                       Input:
   we e
               A

    eE
                      H
                                                             
          O
                                         cricket   cricket
               P1    P2        PN        shot     bowling


          fO
               f1    f2        fN

Goals:
 Hidden human poses
 Structural connectivity
 Potential parameters
 Potential weights

                                                                 27
Model Learning
                                           Input:
   we e
               A

    eE
                      H
                                                                  
          O
                                              cricket   cricket
               P1    P2        PN             shot     bowling


          fO
               f1    f2        fN

Goals:
 Hidden human poses            Hidden variables
 Structural connectivity       Structure learning
 Potential parameters
                               Parameter estimation
 Potential weights

                                                                      28
Model Learning

   we e
               A
                                     Approach:
    eE
                      H
                                          croquet shot

          O

               P1    P2        PN


          fO
               f1    f2        fN

Goals:
 Hidden human poses
 Structural connectivity
 Potential parameters
 Potential weights

                                                         29
Model Learning

   we e
               A
                                        Approach:
    eE
                                                                   E   
                                                                           2
                                                                            
                                                  max  e we e 
                      H
                                    Hill-climbing                            
                                                  E e
                                                                     2 2 
          O                                                                 
                                                       Joint density Gaussian priori of
               P1    P2       PN                      of the model the edge number

          fO
               f1    f2       fN                          


                                                           

Goals:
 Hidden human poses
                                                                                
                                                                            
 Structural connectivity
                                                                            
 Potential parameters
 Potential weights
                                                        
                                                                                   30
Model Learning

   we e
               A
                                     Approach:
    eE
                      H
                                      • Maximum likelihood
          O                             e ( A, O )  e ( A, H )  e (O, H )
               P1    P2        PN
                                        e ( H , Pn )  e (O, Pn )  e ( Pm , Pn )

          fO                          • Standard AdaBoost
               f1    f2        fN      e (O, f O )  e ( Pn , f Pn )

Goals:
 Hidden human poses
 Structural connectivity
 Potential parameters
 Potential weights

                                                                                     31
Model Learning

   we e
               A
                                     Approach:
    eE
                      H
                                     Max-margin learning
                                            1
                                       min  w r                  i
          O                                                2

                                       w , 2              2
               P1    P2        PN            r                    i


                                      s.t. i, r where y  r   y  ci  ,
          fO
                                                w ci  xi  w r  xi  1  i
               f1    f2        fN
                                          i, i  0
Goals:
 Hidden human poses                                 Notations
 Structural connectivity              • xi: Potential values of the i-th image.
                                      • wr: Potential weights of the r-th pose.
 Potential parameters                 • y(r): Activity of the r-th pose.
 Potential weights                    • ξi: A slack variable for the i-th image.


                                                                                   32
Learning Results

 Cricket
defensive
  shot




Cricket
bowling




Croquet
 shot




                               33
Learning Results

 Tennis
forehand




 Tennis
 serve




Volleyball
 smash




                                34
Outline
 • Background and Intuition
 • Mutual Context of Object and Human Pose
    Model Representation
    Model Learning
    Model Inference
 • Experiments
 • Conclusion



                                             35
Model Inference
                      I


 The learned models


                           




                                36
Model Inference
                                          I


       The learned models


                                                                       Head detection

                                                          

                                                                       Torso detection
                                              Compositional
                                                Inference
                                                                              
                                               [Chen et al, 2007]

         
        A1 , H1 , O1* , P*n 
                           1,     n
                                      
                                                                    Tennis racket detection
Layout of the object and body parts.
                                                                                              37
Model Inference
                                     I


 The learned models


                                           


Output


                                          

    
   A1 , H1 , O1* , P*n 
                      1,     n
                                                 
                                                 AK , H K , OK ,PK ,n 
                                                              *    *
                                                                        n
                                                                             
                                                                            38
Outline
 • Background and Intuition
 • Mutual Context of Object and Human Pose
    Model Representation
    Model Learning
    Model Inference
 • Experiments
 • Conclusion



                                             39
Dataset and Experiment Setup
 Sport data set: 6 classes
  180 training (supervised with object and part locations) & 120 testing images

                                              Tasks:
                                               • Object detection;
                                               • Pose estimation;
                                               • Activity classification.
    Cricket           Cricket   Croquet
 defensive shot       bowling    shot




    Tennis            Tennis    Volleyball
   forehand           serve      smash
[Gupta et al, 2009]
                                                                            40
Dataset and Experiment Setup
 Sport data set: 6 classes
  180 training (supervised with object and part locations) & 120 testing images

                                              Tasks:
                                               • Object detection;
                                               • Pose estimation;
                                               • Activity classification.
    Cricket           Cricket   Croquet
 defensive shot       bowling    shot




    Tennis            Tennis    Volleyball
   forehand           serve      smash
[Gupta et al, 2009]
                                                                            41
Object Detection Results
                                     Cricket bat                        Cricket ball
                                                                 1
                 Valid
                region
                                                                0.8

                               




                                                    Precision
                                                                0.6
                               
                                                                0.4

 Sliding Pedestrian  Our                                        0.2
 window   context   Method
                                                                 0
[Andriluka      [Dalal &                                          0   0.2   0.4    0.6   0.8   1
                                                                              Recall
et al, 2009] Triggs, 2006]

              Croquet mallet        Tennis racket                       Volleyball
                                                                 1

                                                                0.8




                                                    Precision
                                                                0.6

                                                                0.4

                                                                0.2

                                                                 0
                                                                  0   0.2   0.4    0.6   0.8   1
                                                                              Recall




                                                                                                   42
1
                          Our Method
                                  1
                                             Object Detection Results
            0.8           Pedestrian as context Our 1
                                                    Method
                                0.8             Pedestrian as context Method
                          Scanning window detector                Our
                 Sliding window Pedestrian context Our method     Pedestrian as context Cricket ball
                                                Scanning window detector
Precision




            0.6                                  0.8                              1
                                                                  Scanning window detector



                                 Precision
                                0.6
            Small object



                                                                                0.8




                                                  Precision
            0.4                                  0.6




                                                                               Precision
                                0.4                                             0.6

            0.2                                  0.4                            0.4
                                0.2
                                                                                0.2
              0                                  0.2
               0     0.2    0.4     0.6     0.8     1
                                  0
                              Recall
                                                                                  0
                                                                                   0     0.2 0.4    0.6 0.8                1
                                   0      0.2    0.4     0.6     0.8      1                    Recall
                                                    0
                                                    Recall 0.2
                                                     0             0.4      0.6      0.8     1
                                                                      Recall
                                                                                                   Volleyball
            Background clutter




                                                                                            1

                                                                                           0.8




                                                                               Precision
                                                                                           0.6

                                                                                           0.4

                                                                                           0.2

                                                                                            0
                                                                                             0   0.2   0.4    0.6   0.8    1
                                                                                                         Recall           43




                                                                                                                               43
Dataset and Experiment Setup
 Sport data set: 6 classes
  180 training & 120 testing images

                                             Tasks:
                                             • Object detection;
                                             • Pose estimation;
                                             • Activity classification.
    Cricket           Cricket   Croquet
 defensive shot       bowling    shot




    Tennis            Tennis    Volleyball
   forehand           serve      smash
[Gupta et al, 2009]
                                                                          44
Human Pose Estimation Results
 Method        Torso   Upper Leg   Lower Leg   Upper Arm   Lower Arm   Head
Ramanan,
  2006
                .52    .22   .22   .21   .28   .24   .28   .17   .14   .42

Andriluka et
 al, 2009
                .50    .31   .30   .31   .27   .18   .19   .11   .11   .45

  Our full
  model
                .66    .43   .39   .44   .34   .44   .40   .27   .29   .58




                                                                         45
Human Pose Estimation Results
 Method        Torso   Upper Leg        Lower Leg          Upper Arm     Lower Arm    Head
Ramanan,
  2006
                .52    .22       .22    .21      .28       .24    .28    .17    .14    .42

Andriluka et
 al, 2009
                .50    .31       .30    .31      .27       .18    .19    .11    .11    .45

  Our full
  model
                .66    .43       .39    .44      .34       .44    .40    .27    .29    .58




Tennis serve    Our estimation     Andriluka            Volleyball Our estimation Andriluka
   model            result         et al, 2009         smash model     result    et al, 2009
                                                                                          46
Human Pose Estimation Results
 Method         Torso    Upper Leg    Lower Leg   Upper Arm    Lower Arm    Head
Ramanan,
  2006
                 .52    .22     .22   .21   .28   .24   .28    .17    .14     .42

Andriluka et
 al, 2009
                 .50    .31     .30   .31   .27   .18   .19    .11    .11     .45

  Our full
  model
                 .66    .43     .39   .44   .34   .44   .40    .27    .29     .58

 One pose
 per class
                 .63    .40     .36   .41   .31   .38   .35    .21    .23     .52




             Estimation Estimation                Estimation         Estimation
               result     result                    result             result
                                                                                  47
Dataset and Experiment Setup
 Sport data set: 6 classes
  180 training & 120 testing images

                                             Tasks:
                                             • Object detection;
                                             • Pose estimation;
                                             • Activity classification.
    Cricket           Cricket   Croquet
 defensive shot       bowling    shot




    Tennis            Tennis    Volleyball
   forehand           serve      smash
[Gupta et al, 2009]
                                                                          48
Activity Classification Results
                             No scene
                            information       Scene is
                          0.9                 critical!!          Cricket
                                 83.3%
                                                                   shot
Classification accuracy




                          0.8              78.9%


                          0.7
                                                                  Tennis
                          0.6                         52.5%      forehand

                          0.5
                                 Our      Gupta et   Bag-of-
                                Our
                                model    Gupta et Bag-of-words
                                           al, 2009 Words
                                model    al, 2009 SIFT+SVM




                                                                            49
Conclusion                                Grouplet representation

Human-Object Interaction
                                                      Vs.



                                          Mutual context model




Next Steps
• Pose estimation & Object detection on PPMI images.
• Modeling multiple objects and humans.
 
                                                                    50
Acknowledgment
• Stanford Vision Lab reviewers:
  – Barry Chai (1985-2010)
  – Juan Carlos Niebles
  – Hao Su
• Silvio Savarese, U. Michigan
• Anonymous reviewers



                                   51

Mais conteúdo relacionado

Semelhante a Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

CVPR2010: grouplet: a structured image representation for recognizing human a...
CVPR2010: grouplet: a structured image representation for recognizing human a...CVPR2010: grouplet: a structured image representation for recognizing human a...
CVPR2010: grouplet: a structured image representation for recognizing human a...zukun
 
Choi ECCV12 presentation
Choi ECCV12 presentationChoi ECCV12 presentation
Choi ECCV12 presentationWongun Choi
 
ICCV2011: Human Action Recognition by Learning bases of action attributes and...
ICCV2011: Human Action Recognition by Learning bases of action attributes and...ICCV2011: Human Action Recognition by Learning bases of action attributes and...
ICCV2011: Human Action Recognition by Learning bases of action attributes and...zukun
 
Recognizing Human-Object Interactions in Still Images by Modeling the Mutual ...
Recognizing Human-Object Interactions inStill Images by Modeling the Mutual ...Recognizing Human-Object Interactions inStill Images by Modeling the Mutual ...
Recognizing Human-Object Interactions in Still Images by Modeling the Mutual ...أحلام انصارى
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2zukun
 
PHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTIONPHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTIONFaisal Azhar
 
Defence session
Defence sessionDefence session
Defence sessionAli Borji
 

Semelhante a Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities (8)

CVPR2010: grouplet: a structured image representation for recognizing human a...
CVPR2010: grouplet: a structured image representation for recognizing human a...CVPR2010: grouplet: a structured image representation for recognizing human a...
CVPR2010: grouplet: a structured image representation for recognizing human a...
 
Choi ECCV12 presentation
Choi ECCV12 presentationChoi ECCV12 presentation
Choi ECCV12 presentation
 
ICCV2011: Human Action Recognition by Learning bases of action attributes and...
ICCV2011: Human Action Recognition by Learning bases of action attributes and...ICCV2011: Human Action Recognition by Learning bases of action attributes and...
ICCV2011: Human Action Recognition by Learning bases of action attributes and...
 
Recognizing Human-Object Interactions in Still Images by Modeling the Mutual ...
Recognizing Human-Object Interactions inStill Images by Modeling the Mutual ...Recognizing Human-Object Interactions inStill Images by Modeling the Mutual ...
Recognizing Human-Object Interactions in Still Images by Modeling the Mutual ...
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2
 
PHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTIONPHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTION
 
Pc Seminar Jordi
Pc Seminar JordiPc Seminar Jordi
Pc Seminar Jordi
 
Defence session
Defence sessionDefence session
Defence session
 

Mais de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mais de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

  • 1. Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford University {bangpeng,feifeili}@cs.stanford.edu 1
  • 2. Human-Object Interaction Robots interact Automatic sports Medical care with objects commentary “Kobe is dunking the ball.” 2
  • 3. Human-Object Interaction Holistic image based classification (Previous talk: Grouplet) Playing Playing bassoon saxophone Detailed understanding and reasoning Vs. Playing saxophone Grouplet is a generic feature for structured objects, or interactions of groups of objects. HOI activity: Tennis Forehand Berg & Malik, 2005 Grauman & Darrell, 2005 Gehler & Nowozin, 2009 OURS Caltech101 48% 59% 77% 62% 3
  • 4. Human-Object Interaction Holistic image based classification Detailed understanding and reasoning • Human pose estimation Head Torso 4
  • 5. Human-Object Interaction Holistic image based classification Detailed understanding and reasoning • Human pose estimation • Object detection Tennis racket 5
  • 6. Human-Object Interaction Holistic image based classification Detailed understanding and reasoning • Human pose estimation • Object detection Head Tennis Torso racket HOI activity: Tennis Forehand 6
  • 7. Outline • Background and Intuition • Mutual Context of Object and Human Pose  Model Representation  Model Learning  Model Inference • Experiments • Conclusion 7
  • 8. Outline • Background and Intuition • Mutual Context of Object and Human Pose  Model Representation  Model Learning  Model Inference • Experiments • Conclusion 8
  • 9. Human pose estimation & Object detection Human pose Difficult part estimation is appearance challenging. Self-occlusion Image region looks like a body part • Felzenszwalb & Huttenlocher, 2005 • Ren et al, 2005 • Ramanan, 2006 • Ferrari et al, 2008 • Yang & Mori, 2008 • Andriluka et al, 2009 9 • Eichner & Ferrari, 2009
  • 10. Human pose estimation & Object detection Human pose estimation is challenging. • Felzenszwalb & Huttenlocher, 2005 • Ren et al, 2005 • Ramanan, 2006 • Ferrari et al, 2008 • Yang & Mori, 2008 • Andriluka et al, 2009 10 • Eichner & Ferrari, 2009
  • 11. Human pose estimation & Object detection Facilitate Given the object is detected. 11
  • 12. Human pose estimation & Object detection Object detection is Small, low- challenging resolution, partially occluded Image region similar to detection target • Viola & Jones, 2001 • Lampert et al, 2008 • Divvala et al, 2009 • Vedaldi et al, 2009 12
  • 13. Human pose estimation & Object detection Object detection is challenging • Viola & Jones, 2001 • Lampert et al, 2008 • Divvala et al, 2009 • Vedaldi et al, 2009 13
  • 14. Human pose estimation & Object detection Facilitate Given the pose is estimated. 14
  • 15. Human pose estimation & Object detection Mutual Context 15
  • 16. Context in Computer Vision Previous work – Use context cues to facilitate object detection: Helpful, but only moderately outperform better ~3-4% with without context context • Hoiem et al, 2006 • Murphy et al, 2003 • Viola & Jones, 2001 • Rabinovich et al, 2007 • Shotton et al, 2006 • Lampert et al, 2008 • • • Oliva & Torralba, 2007 Heitz & Koller, 2008 • Harzallah et al, 2009 Li, Socher & Fei-Fei, 2009  • Desai et al, 2009 • Marszalek et al, 2009 •  Divvala et al, 2009 • Bao & Savarese, 2010 16
  • 17. Context in Computer Vision Previous work – Use context Our approach – Two challenging cues to facilitate object detection: tasks serve as mutual context of each other: With mutual context: Helpful, but only moderately outperform better ~3-4% Without context: with without context context • Hoiem et al, 2006 • Murphy et al, 2003 • Rabinovich et al, 2007 • Shotton et al, 2006 • Oliva & Torralba, 2007 • Harzallah et al, 2009 • Heitz & Koller, 2008 • Li, Socher & Fei-Fei, 2009 • Desai et al, 2009 • Marszalek et al, 2009 • Divvala et al, 2009 • Bao & Savarese, 2010 17
  • 18. Outline • Background and Intuition • Mutual Context of Object and Human Pose  Model Representation  Model Learning  Model Inference • Experiments • Conclusion 18
  • 19. Mutual Context Model Representation A:  Activity A Tennis Croquet Volleyball Human pose forehand shot smash H O: Object  O Tennis Croquet Volleyball Body parts racket mallet P1 P2  PN H: fO f1 f2  fN Intra-class variations • More than one H for each A; Image evidence • Unobserved during training. P: lP: location; θP: orientation; sP: scale. f: Shape context. [Belongie et al, 2002] 19
  • 20. Mutual Context Model Representation Markov Random Field •  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency A    we e of co-occurrence between A, O, and H. eE  e ( A, H )  e ( A, O ) Clique Clique H weight potential  e (O, H ) O P1 P2  PN fO f1 f2  fN 20
  • 21. Mutual Context Model Representation Markov Random Field •  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency A    we e of co-occurrence between A, O, and H. eE •  e (O, Pn ) , e ( H , Pn ) , e ( Pm , Pn ) : Spatial Clique Clique H weight potential relationship among object and body parts.    bin lO  lPn  bin O   Pn   sO sPn    O  e ( H , Pn ) location orientation size  e (O, Pn ) P1 P2  PN  e ( Pm , Pn ) fO f1 f2  fN 21
  • 22. Mutual Context Model Representation Markov Random Field •  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency A    we e of co-occurrence between A, O, and H. eE •  e (O, Pn ) , e ( H , Pn ) , e ( Pm , Pn ) : Spatial Clique Clique H weight potential relationship among object and body parts.    bin lO  lPn  bin O   Pn   sO sPn    O  e ( H , Pn ) location orientation size  e (O, Pn ) Obtained by structure learning • Learn structural connectivity among P1 P2  PN the body parts and the object.  e ( Pm , Pn ) fO f1 f2  fN 22
  • 23. Mutual Context Model Representation Markov Random Field •  e ( A, O ) ,  e ( A, H ) ,  e (O, H ) : Frequency A    we e of co-occurrence between A, O, and H. eE •  e (O, Pn ) , e ( H , Pn ) , e ( Pm , Pn ) : Spatial Clique Clique H weight potential relationship among object and body parts.    bin lO  lPn  bin O   Pn   sO sPn    O location orientation size • Learn structural connectivity among  e (O , f O ) P1 P2  PN the body parts and the object.  e ( Pn , f P ) n fO •  e (O, f O ) and  e ( Pn , f Pn ): Discriminative part detection scores. f1 f2  fN Shape context + AdaBoost [Andriluka et al, 2009] [Belongie et al, 2002] [Viola & Jones, 2001] 23
  • 24. Outline • Background and Intuition • Mutual Context of Object and Human Pose  Model Representation  Model Learning  Model Inference • Experiments • Conclusion 24
  • 25. Model Learning Input:    we e A eE H  O cricket cricket P1 P2  PN shot bowling fO f1 f2  fN Goals: Hidden human poses 25
  • 26. Model Learning Input:    we e A eE H  O cricket cricket P1 P2  PN shot bowling fO f1 f2  fN Goals: Hidden human poses Structural connectivity 26
  • 27. Model Learning Input:    we e A eE H  O cricket cricket P1 P2  PN shot bowling fO f1 f2  fN Goals: Hidden human poses Structural connectivity Potential parameters Potential weights 27
  • 28. Model Learning Input:    we e A eE H  O cricket cricket P1 P2  PN shot bowling fO f1 f2  fN Goals: Hidden human poses Hidden variables Structural connectivity Structure learning Potential parameters Parameter estimation Potential weights 28
  • 29. Model Learning    we e A Approach: eE H croquet shot O P1 P2  PN fO f1 f2  fN Goals: Hidden human poses Structural connectivity Potential parameters Potential weights 29
  • 30. Model Learning    we e A Approach: eE   E    2   max  e we e  H Hill-climbing  E e  2 2  O   Joint density Gaussian priori of P1 P2  PN of the model the edge number fO f1 f2  fN   Goals: Hidden human poses         Structural connectivity     Potential parameters Potential weights   30
  • 31. Model Learning    we e A Approach: eE H • Maximum likelihood O  e ( A, O )  e ( A, H )  e (O, H ) P1 P2  PN  e ( H , Pn )  e (O, Pn )  e ( Pm , Pn ) fO • Standard AdaBoost f1 f2  fN  e (O, f O )  e ( Pn , f Pn ) Goals: Hidden human poses Structural connectivity Potential parameters Potential weights 31
  • 32. Model Learning    we e A Approach: eE H Max-margin learning 1 min  w r    i O 2 w , 2 2 P1 P2  PN r i s.t. i, r where y  r   y  ci  , fO w ci  xi  w r  xi  1  i f1 f2  fN i, i  0 Goals: Hidden human poses Notations Structural connectivity • xi: Potential values of the i-th image. • wr: Potential weights of the r-th pose. Potential parameters • y(r): Activity of the r-th pose. Potential weights • ξi: A slack variable for the i-th image. 32
  • 33. Learning Results Cricket defensive shot Cricket bowling Croquet shot 33
  • 34. Learning Results Tennis forehand Tennis serve Volleyball smash 34
  • 35. Outline • Background and Intuition • Mutual Context of Object and Human Pose  Model Representation  Model Learning  Model Inference • Experiments • Conclusion 35
  • 36. Model Inference I The learned models   36
  • 37. Model Inference I The learned models Head detection   Torso detection Compositional Inference  [Chen et al, 2007]   A1 , H1 , O1* , P*n  1, n  Tennis racket detection Layout of the object and body parts. 37
  • 38. Model Inference I The learned models   Output     A1 , H1 , O1* , P*n  1, n    AK , H K , OK ,PK ,n  * * n  38
  • 39. Outline • Background and Intuition • Mutual Context of Object and Human Pose  Model Representation  Model Learning  Model Inference • Experiments • Conclusion 39
  • 40. Dataset and Experiment Setup Sport data set: 6 classes 180 training (supervised with object and part locations) & 120 testing images Tasks: • Object detection; • Pose estimation; • Activity classification. Cricket Cricket Croquet defensive shot bowling shot Tennis Tennis Volleyball forehand serve smash [Gupta et al, 2009] 40
  • 41. Dataset and Experiment Setup Sport data set: 6 classes 180 training (supervised with object and part locations) & 120 testing images Tasks: • Object detection; • Pose estimation; • Activity classification. Cricket Cricket Croquet defensive shot bowling shot Tennis Tennis Volleyball forehand serve smash [Gupta et al, 2009] 41
  • 42. Object Detection Results Cricket bat Cricket ball 1 Valid region 0.8  Precision 0.6  0.4 Sliding Pedestrian Our 0.2 window context Method 0 [Andriluka [Dalal & 0 0.2 0.4 0.6 0.8 1 Recall et al, 2009] Triggs, 2006] Croquet mallet Tennis racket Volleyball 1 0.8 Precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall 42
  • 43. 1 Our Method 1 Object Detection Results 0.8 Pedestrian as context Our 1 Method 0.8 Pedestrian as context Method Scanning window detector Our Sliding window Pedestrian context Our method Pedestrian as context Cricket ball Scanning window detector Precision 0.6 0.8 1 Scanning window detector Precision 0.6 Small object 0.8 Precision 0.4 0.6 Precision 0.4 0.6 0.2 0.4 0.4 0.2 0.2 0 0.2 0 0.2 0.4 0.6 0.8 1 0 Recall 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall 0 Recall 0.2 0 0.4 0.6 0.8 1 Recall Volleyball Background clutter 1 0.8 Precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall 43 43
  • 44. Dataset and Experiment Setup Sport data set: 6 classes 180 training & 120 testing images Tasks: • Object detection; • Pose estimation; • Activity classification. Cricket Cricket Croquet defensive shot bowling shot Tennis Tennis Volleyball forehand serve smash [Gupta et al, 2009] 44
  • 45. Human Pose Estimation Results Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head Ramanan, 2006 .52 .22 .22 .21 .28 .24 .28 .17 .14 .42 Andriluka et al, 2009 .50 .31 .30 .31 .27 .18 .19 .11 .11 .45 Our full model .66 .43 .39 .44 .34 .44 .40 .27 .29 .58 45
  • 46. Human Pose Estimation Results Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head Ramanan, 2006 .52 .22 .22 .21 .28 .24 .28 .17 .14 .42 Andriluka et al, 2009 .50 .31 .30 .31 .27 .18 .19 .11 .11 .45 Our full model .66 .43 .39 .44 .34 .44 .40 .27 .29 .58 Tennis serve Our estimation Andriluka Volleyball Our estimation Andriluka model result et al, 2009 smash model result et al, 2009 46
  • 47. Human Pose Estimation Results Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head Ramanan, 2006 .52 .22 .22 .21 .28 .24 .28 .17 .14 .42 Andriluka et al, 2009 .50 .31 .30 .31 .27 .18 .19 .11 .11 .45 Our full model .66 .43 .39 .44 .34 .44 .40 .27 .29 .58 One pose per class .63 .40 .36 .41 .31 .38 .35 .21 .23 .52 Estimation Estimation Estimation Estimation result result result result 47
  • 48. Dataset and Experiment Setup Sport data set: 6 classes 180 training & 120 testing images Tasks: • Object detection; • Pose estimation; • Activity classification. Cricket Cricket Croquet defensive shot bowling shot Tennis Tennis Volleyball forehand serve smash [Gupta et al, 2009] 48
  • 49. Activity Classification Results No scene information Scene is 0.9 critical!! Cricket 83.3% shot Classification accuracy 0.8 78.9% 0.7 Tennis 0.6 52.5% forehand 0.5 Our Gupta et Bag-of- Our model Gupta et Bag-of-words al, 2009 Words model al, 2009 SIFT+SVM 49
  • 50. Conclusion Grouplet representation Human-Object Interaction Vs. Mutual context model Next Steps • Pose estimation & Object detection on PPMI images. • Modeling multiple objects and humans.  50
  • 51. Acknowledgment • Stanford Vision Lab reviewers: – Barry Chai (1985-2010) – Juan Carlos Niebles – Hao Su • Silvio Savarese, U. Michigan • Anonymous reviewers 51