SlideShare a Scribd company logo
1 of 62
Download to read offline
Structured Regression for
            Efficient Object Detection

                        Christoph Lampert
                   www.christoph-lampert.org

            Max Planck Institute for Biological Cybernetics, Tübingen


                          December 3rd, 2009




• [C.L., Matthew B. Blaschko, Thomas Hofmann. CVPR 2008]
• [Matthew B. Blaschko, C.L. ECCV 2008]
• [C.L., Matthew B. Blaschko, Thomas Hofmann. PAMI 2009]
Category-Level Object Localization
Category-Level Object Localization




             What objects are present? person, car
Category-Level Object Localization




                    Where are the objects?
Object Localization ⇒ Scene Interpretation




    A man inside of a car       A man outside of a car
    ⇒ He’s driving.             ⇒ He’s passing by.
Algorithmic Approach: Sliding Window




       f (y1 ) = 0.2         f (y2 ) = 0.8        f (y3 ) = 1.5

   Use a (pre-trained) classifier function f :
     • Place candidate window on the image.
     • Iterate:
            Evaluate f and store result.
            Shift candidate window by k pixels.
     • Return position where f was largest.
Algorithmic approach: Sliding Window




       f (y1 ) = 0.2       f (y2 ) = 0.8      f (y3 ) = 1.5

   Drawbacks:
     • single scale, single aspect ratio
       → repeat with different window sizes/shapes
     • search on grid
       → speed–accuracy tradeoff
     • computationally expensive
New view: Generalized Sliding Window




   Assumptions:
     • Objects are rectangular image regions of arbitrary size.
     • The score of f is largest at the correct object position.

   Mathematical Formulation:

                       yopt = argmax f (y)
                                  y∈Y


   with Y = {all rectangular regions in image}
New view: Generalized Sliding Window




   Mathematical Formulation:

                        yopt = argmax f (y)
                                  y∈Y


   with Y = {all rectangular regions in image}

   • How to choose/construct/learn the function f ?
   • How to do the optimization efficiently and robustly?
     (exhaustive search is too slow, O(w2 h2 ) elements).
New view: Generalized Sliding Window




   Mathematical Formulation:

                        yopt = argmax f (y)
                                  y∈Y


   with Y = {all rectangular regions in image}

   • How to choose/construct/learn the function f ?
   • How to do the optimization efficiently and robustly?
     (exhaustive search is too slow, O(w2 h2 ) elements).
New view: Generalized Sliding Window


   Use the problem’s geometric structure:
New view: Generalized Sliding Window


   Use the problem’s geometric structure:
                                             • Calculate scores for
                                               sets of boxes jointly.

                                             • If no element can
                                               contain the maximum,
                                               discard the box set.

                                             • Otherwise, split the
                                               box set and iterate.

                                            → Branch-and-bound
                                              optimization
   • finds global maximum yopt
New view: Generalized Sliding Window


   Use the problem’s geometric structure:
                                             • Calculate scores for
                                               sets of boxes jointly.

                                             • If no element can
                                               contain the maximum,
                                               discard the box set.

                                             • Otherwise, split the
                                               box set and iterate.

                                            → Branch-and-bound
                                              optimization
   • finds global maximum yopt
Representing Sets of Boxes

     • Boxes: [l, t, r, b] ∈ R4 .
Representing Sets of Boxes

     • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4
Representing Sets of Boxes

     • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4




   Splitting:
     • Identify largest interval.
Representing Sets of Boxes

     • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4




   Splitting:
     • Identify largest interval. Split at center: R → R1 ∪R2 .
Representing Sets of Boxes

     • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4




   Splitting:
     • Identify largest interval. Split at center: R → R1 ∪R2 .
     • New box sets: [L, T, R1 , B]
Representing Sets of Boxes

     • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4




   Splitting:
     • Identify largest interval. Split at center: R → R1 ∪R2 .
     • New box sets: [L, T, R1 , B] and [L, T, R2 , B].
Calculating Scores for Box Sets

   Example: Linear Support-Vector-Machine f (y) :=     pi ∈y   wi .




                                    +


         f upper (Y) =        min(0, wi ) +    max(0, wi )
                         pi ∈y∩           pi ∈y∪

   Can be computed in O(1) using integral images.
Calculating Scores for Box Sets
                                                  J               y
   Histogram Intersection Similarity: f (y) :=    j=1   min(hj , hj ).




                                  J               ∪
                                                  y
                 f upper (Y) =    j=1
                                        min(hj , hj )

   As fast as for a single box: O(J) with integral histograms.
Evaluation: Speed (on PASCAL VOC 2006)




   Sliding Window Runtime:
      • always: O(w2 h2 )
   Branch-and-Bound (ESS) Runtime:
      • worst-case: O(w2 h2 )
      • empirical: not more than O(wh)
Extensions:


   Action classification: (y, t)opt = argmax(y,t)∈Y×T fx (y, t)




   • J. Yuan: Discriminative 3D Subvolume Search for Efficient Action Detection, CVPR 2009.
Extensions:

   Localized image retrieval: (x, y)opt = argmaxy∈Y, x∈D fx (y)




   • C.L.: Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, ICCV 2009
Extensions:


   Hybrid – Branch-and-Bound with Implicit Shape Model




   • A. Lehmann, B. Leibe, L. van Gool: Feature-Centric Efficient Subwindow Search, ICCV 2009
Generalized Sliding Window




                       yopt = argmax f (y)
                                y∈Y


   with Y = {all rectangular regions in image}

   • How to choose/construct/learn f ?
   • How to do the optimization efficiently and robustly?
Traditional Approach: Binary Classifier

    Training images:
         +            +
      • x1 , . . . , xn show the object
         −            −
      • x1 , . . . , xm show something else

    Train a classifier, e.g.
      • support vector machine,
      • boosted cascade,
      • artificial neural network,. . .

    Decision function f : {images} → R
      • f > 0 means “image shows the object.”
      • f < 0 means “image does not show
        the object.”
Traditional Approach: Binary Classifier


   Drawbacks:


      • Train distribution
        = test distribution

      • No control over partial
        detections.

      • No guarantee to even find
        training examples again.
Object Localization as Structured Output Regression

   Ideal setup:
     • function
                    g : {all images} → {all boxes}
       to predict object boxes from images
     • train and test in the same way, end-to-end



                                 

      gcar                          =
Object Localization as Structured Output Regression

   Ideal setup:
     • function
                        g : {all images} → {all boxes}
       to predict object boxes from images
     • train and test in the same way, end-to-end

   Regression problem:
     • training examples (x1 , y1 ), . . . , (xn , yn ) ∈ X × Y
             xi are images, yi are bounding boxes
     • Learn a mapping
                                      g : X →Y
        that generalizes from the given examples:
             g(xi ) ≈ yi , for i = 1, . . . , n,
Structured Support Vector Machine

     SVM-like framework by Tsochantaridis et al.:
       • Positive definite kernel k : (X × Y) × (X × Y)→R.
         ϕ : X × Y → H : (implicit) feature map induced by k.
         • ∆ : Y × Y → R:                        loss function
         • Solve the convex optimization problem
                                                                               n
                                                        1         2
                                          minw,ξ          w           +C           ξi
                                                        2                    i=1

             subject to margin constraints for i = 1, . . . , n :

             ∀y ∈ Y  {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi ,
         • unique solution: w ∗ ∈ H

• I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large Margin Methods for Structured and Interdependent
  Output Variables, Journal of Machine Learning Research (JMLR), 2005.
Structured Support Vector Machine



     • w ∗ defines compatiblity function

                         F (x, y) = w ∗ , ϕ(x, y)

     • best prediction for x is the most compatible y:

                        g(x) := argmax F (x, y).
                                    y∈Y

     • evaluating g : X → Y is like generalized Sliding Window:
            for fixed x, evaluate quality function for every box y ∈ Y.
            for example, use previous branch-and-bound procedure!
Joint Image/Box-Kernel: Example


   Joint kernel: how to compare one (image,box)-pair (x, y) with
   another (image,box)-pair (x , y )?

   kjoint          ,            =k           ,          is large.



   kjoint          ,            =k           ,          is small.



   kjoint          ,            = kimage         ,

                                            could also be large.
Loss Function: Example

   Loss function: how to compare two boxes y and y ?




          ∆(y, y ) := 1 − area overlap between y and y
                          area(y ∩ y )
                    =1−
                          area(y ∪ y )
Structured Support Vector Machine

                                                                      n
                                                     1       2
     • S-SVM Optimization:                  minw,ξ   2
                                                         w       +C         ξi
                                                                      i=1
        subject to for i = 1, . . . , n :

   ∀y ∈ Y  {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi ,
Structured Support Vector Machine

                                                                      n
                                                     1       2
     • S-SVM Optimization:                  minw,ξ   2
                                                         w       +C         ξi
                                                                      i=1
        subject to for i = 1, . . . , n :

   ∀y ∈ Y  {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi ,

     • Solve via constraint generation:
     • Iterate:
             Solve minimization with working set of contraints
             Identify argmaxy∈Y ∆(y, yi ) + w, ϕ(xi , y)
             Add violated constraints to working set and iterate
     • Polynomial time convergence to any precision ε

     • Similar to bootstrap training, but with a margin.
Evaluation: PASCAL VOC 2006




           Example detections for VOC 2006 bicycle, bus and cat.




          Precision–recall curves for VOC 2006 bicycle, bus and cat.

    • Structured regression improves detection accuracy.
    • New best scores (at that time) in 6 of 10 classes.
Why does it work?




        Learned weights from binary (center) and structured training (right).



     • Both methods assign positive weights to object region.
     • Structured training also assigns negative weights to
       features surrounding the bounding box position.
     • Posterior distribution over box coordinates becomes more
       peaked.
More Recent Results (PASCAL VOC 2009)




                aeroplane
More Recent Results (PASCAL VOC 2009)




                 bicycle
More Recent Results (PASCAL VOC 2009)




                   bird
More Recent Results (PASCAL VOC 2009)




                   boat
More Recent Results (PASCAL VOC 2009)




                   bottle
More Recent Results (PASCAL VOC 2009)




                    bus
More Recent Results (PASCAL VOC 2009)




                     car
More Recent Results (PASCAL VOC 2009)




                     cat
More Recent Results (PASCAL VOC 2009)




                     chair
More Recent Results (PASCAL VOC 2009)




                     cow
More Recent Results (PASCAL VOC 2009)




                   diningtable
More Recent Results (PASCAL VOC 2009)




                      dog
More Recent Results (PASCAL VOC 2009)




                      horse
More Recent Results (PASCAL VOC 2009)




                    motorbike
More Recent Results (PASCAL VOC 2009)




                      person
More Recent Results (PASCAL VOC 2009)




                    pottedplant
More Recent Results (PASCAL VOC 2009)




                       sheep
More Recent Results (PASCAL VOC 2009)




                        sofa
More Recent Results (PASCAL VOC 2009)




                        train
More Recent Results (PASCAL VOC 2009)




                      tvmonitor
Extensions:

      Image segmentation with connectedness constraint:




               CRF segmentation                             connected CRF segmentation

 • S. Nowozin, C.L.: Global Connectivity Potentials for Random Field Models, CVPR 2009.
Summary


  Object Localization is a step towards image interpretation.

  Conceptual approach instead of algorithmic:
    • Branch-and-bound evaluation:
           don’t slide a window, but solve an argmax problem,
           ⇒ higher efficiency

    • Structured regression training:
           solve the prediction problem, not a classification proxy.
           ⇒ higher localization accuracy

    • Modular and kernelized:
           easily adapted to other problems/representations, e.g.
           image segmentations
Structured regression for efficient object detection

More Related Content

What's hot

Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
frdos
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2
zukun
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
zukun
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learning
zukun
 
Calculus Cheat Sheet All
Calculus Cheat Sheet AllCalculus Cheat Sheet All
Calculus Cheat Sheet All
Moe Han
 
Computer Science and Information Science 3rd semester (2012-December) Questio...
Computer Science and Information Science 3rd semester (2012-December) Questio...Computer Science and Information Science 3rd semester (2012-December) Questio...
Computer Science and Information Science 3rd semester (2012-December) Questio...
B G S Institute of Technolgy
 

What's hot (19)

07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicing
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
03 finding roots
03 finding roots03 finding roots
03 finding roots
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motion
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learning
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems Regularization
 
Calculus Cheat Sheet All
Calculus Cheat Sheet AllCalculus Cheat Sheet All
Calculus Cheat Sheet All
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
 
Common derivatives integrals_reduced
Common derivatives integrals_reducedCommon derivatives integrals_reduced
Common derivatives integrals_reduced
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) pictures
 
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
05210401  P R O B A B I L I T Y  T H E O R Y  A N D  S T O C H A S T I C  P R...05210401  P R O B A B I L I T Y  T H E O R Y  A N D  S T O C H A S T I C  P R...
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
 
Computer Science and Information Science 3rd semester (2012-December) Questio...
Computer Science and Information Science 3rd semester (2012-December) Questio...Computer Science and Information Science 3rd semester (2012-December) Questio...
Computer Science and Information Science 3rd semester (2012-December) Questio...
 

Similar to Structured regression for efficient object detection

Finite Element Analysis Made Easy Lr
Finite Element Analysis Made Easy LrFinite Element Analysis Made Easy Lr
Finite Element Analysis Made Easy Lr
guesta32562
 

Similar to Structured regression for efficient object detection (20)

super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
Dual SVM Problem.pdf
Dual SVM Problem.pdfDual SVM Problem.pdf
Dual SVM Problem.pdf
 
Integration
IntegrationIntegration
Integration
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
point processing
point processingpoint processing
point processing
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Finite Element Analysis Made Easy Lr
Finite Element Analysis Made Easy LrFinite Element Analysis Made Easy Lr
Finite Element Analysis Made Easy Lr
 
Knapsack problem using fixed tuple
Knapsack problem using fixed tupleKnapsack problem using fixed tuple
Knapsack problem using fixed tuple
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
image_enhancement_spatial
 image_enhancement_spatial image_enhancement_spatial
image_enhancement_spatial
 
SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution
SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solutionSIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution
SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution
 
Support Vector Machine.ppt
Support Vector Machine.pptSupport Vector Machine.ppt
Support Vector Machine.ppt
 
svm.ppt
svm.pptsvm.ppt
svm.ppt
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learning
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Structured regression for efficient object detection

  • 1. Structured Regression for Efficient Object Detection Christoph Lampert www.christoph-lampert.org Max Planck Institute for Biological Cybernetics, Tübingen December 3rd, 2009 • [C.L., Matthew B. Blaschko, Thomas Hofmann. CVPR 2008] • [Matthew B. Blaschko, C.L. ECCV 2008] • [C.L., Matthew B. Blaschko, Thomas Hofmann. PAMI 2009]
  • 3. Category-Level Object Localization What objects are present? person, car
  • 4. Category-Level Object Localization Where are the objects?
  • 5. Object Localization ⇒ Scene Interpretation A man inside of a car A man outside of a car ⇒ He’s driving. ⇒ He’s passing by.
  • 6. Algorithmic Approach: Sliding Window f (y1 ) = 0.2 f (y2 ) = 0.8 f (y3 ) = 1.5 Use a (pre-trained) classifier function f : • Place candidate window on the image. • Iterate: Evaluate f and store result. Shift candidate window by k pixels. • Return position where f was largest.
  • 7. Algorithmic approach: Sliding Window f (y1 ) = 0.2 f (y2 ) = 0.8 f (y3 ) = 1.5 Drawbacks: • single scale, single aspect ratio → repeat with different window sizes/shapes • search on grid → speed–accuracy tradeoff • computationally expensive
  • 8. New view: Generalized Sliding Window Assumptions: • Objects are rectangular image regions of arbitrary size. • The score of f is largest at the correct object position. Mathematical Formulation: yopt = argmax f (y) y∈Y with Y = {all rectangular regions in image}
  • 9. New view: Generalized Sliding Window Mathematical Formulation: yopt = argmax f (y) y∈Y with Y = {all rectangular regions in image} • How to choose/construct/learn the function f ? • How to do the optimization efficiently and robustly? (exhaustive search is too slow, O(w2 h2 ) elements).
  • 10. New view: Generalized Sliding Window Mathematical Formulation: yopt = argmax f (y) y∈Y with Y = {all rectangular regions in image} • How to choose/construct/learn the function f ? • How to do the optimization efficiently and robustly? (exhaustive search is too slow, O(w2 h2 ) elements).
  • 11. New view: Generalized Sliding Window Use the problem’s geometric structure:
  • 12. New view: Generalized Sliding Window Use the problem’s geometric structure: • Calculate scores for sets of boxes jointly. • If no element can contain the maximum, discard the box set. • Otherwise, split the box set and iterate. → Branch-and-bound optimization • finds global maximum yopt
  • 13. New view: Generalized Sliding Window Use the problem’s geometric structure: • Calculate scores for sets of boxes jointly. • If no element can contain the maximum, discard the box set. • Otherwise, split the box set and iterate. → Branch-and-bound optimization • finds global maximum yopt
  • 14. Representing Sets of Boxes • Boxes: [l, t, r, b] ∈ R4 .
  • 15. Representing Sets of Boxes • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4
  • 16. Representing Sets of Boxes • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4 Splitting: • Identify largest interval.
  • 17. Representing Sets of Boxes • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4 Splitting: • Identify largest interval. Split at center: R → R1 ∪R2 .
  • 18. Representing Sets of Boxes • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4 Splitting: • Identify largest interval. Split at center: R → R1 ∪R2 . • New box sets: [L, T, R1 , B]
  • 19. Representing Sets of Boxes • Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4 Splitting: • Identify largest interval. Split at center: R → R1 ∪R2 . • New box sets: [L, T, R1 , B] and [L, T, R2 , B].
  • 20. Calculating Scores for Box Sets Example: Linear Support-Vector-Machine f (y) := pi ∈y wi . + f upper (Y) = min(0, wi ) + max(0, wi ) pi ∈y∩ pi ∈y∪ Can be computed in O(1) using integral images.
  • 21. Calculating Scores for Box Sets J y Histogram Intersection Similarity: f (y) := j=1 min(hj , hj ). J ∪ y f upper (Y) = j=1 min(hj , hj ) As fast as for a single box: O(J) with integral histograms.
  • 22. Evaluation: Speed (on PASCAL VOC 2006) Sliding Window Runtime: • always: O(w2 h2 ) Branch-and-Bound (ESS) Runtime: • worst-case: O(w2 h2 ) • empirical: not more than O(wh)
  • 23. Extensions: Action classification: (y, t)opt = argmax(y,t)∈Y×T fx (y, t) • J. Yuan: Discriminative 3D Subvolume Search for Efficient Action Detection, CVPR 2009.
  • 24. Extensions: Localized image retrieval: (x, y)opt = argmaxy∈Y, x∈D fx (y) • C.L.: Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, ICCV 2009
  • 25. Extensions: Hybrid – Branch-and-Bound with Implicit Shape Model • A. Lehmann, B. Leibe, L. van Gool: Feature-Centric Efficient Subwindow Search, ICCV 2009
  • 26.
  • 27. Generalized Sliding Window yopt = argmax f (y) y∈Y with Y = {all rectangular regions in image} • How to choose/construct/learn f ? • How to do the optimization efficiently and robustly?
  • 28. Traditional Approach: Binary Classifier Training images: + + • x1 , . . . , xn show the object − − • x1 , . . . , xm show something else Train a classifier, e.g. • support vector machine, • boosted cascade, • artificial neural network,. . . Decision function f : {images} → R • f > 0 means “image shows the object.” • f < 0 means “image does not show the object.”
  • 29. Traditional Approach: Binary Classifier Drawbacks: • Train distribution = test distribution • No control over partial detections. • No guarantee to even find training examples again.
  • 30. Object Localization as Structured Output Regression Ideal setup: • function g : {all images} → {all boxes} to predict object boxes from images • train and test in the same way, end-to-end   gcar   =
  • 31. Object Localization as Structured Output Regression Ideal setup: • function g : {all images} → {all boxes} to predict object boxes from images • train and test in the same way, end-to-end Regression problem: • training examples (x1 , y1 ), . . . , (xn , yn ) ∈ X × Y xi are images, yi are bounding boxes • Learn a mapping g : X →Y that generalizes from the given examples: g(xi ) ≈ yi , for i = 1, . . . , n,
  • 32. Structured Support Vector Machine SVM-like framework by Tsochantaridis et al.: • Positive definite kernel k : (X × Y) × (X × Y)→R. ϕ : X × Y → H : (implicit) feature map induced by k. • ∆ : Y × Y → R: loss function • Solve the convex optimization problem n 1 2 minw,ξ w +C ξi 2 i=1 subject to margin constraints for i = 1, . . . , n : ∀y ∈ Y {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi , • unique solution: w ∗ ∈ H • I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), 2005.
  • 33. Structured Support Vector Machine • w ∗ defines compatiblity function F (x, y) = w ∗ , ϕ(x, y) • best prediction for x is the most compatible y: g(x) := argmax F (x, y). y∈Y • evaluating g : X → Y is like generalized Sliding Window: for fixed x, evaluate quality function for every box y ∈ Y. for example, use previous branch-and-bound procedure!
  • 34. Joint Image/Box-Kernel: Example Joint kernel: how to compare one (image,box)-pair (x, y) with another (image,box)-pair (x , y )? kjoint , =k , is large. kjoint , =k , is small. kjoint , = kimage , could also be large.
  • 35. Loss Function: Example Loss function: how to compare two boxes y and y ? ∆(y, y ) := 1 − area overlap between y and y area(y ∩ y ) =1− area(y ∪ y )
  • 36. Structured Support Vector Machine n 1 2 • S-SVM Optimization: minw,ξ 2 w +C ξi i=1 subject to for i = 1, . . . , n : ∀y ∈ Y {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi ,
  • 37. Structured Support Vector Machine n 1 2 • S-SVM Optimization: minw,ξ 2 w +C ξi i=1 subject to for i = 1, . . . , n : ∀y ∈ Y {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi , • Solve via constraint generation: • Iterate: Solve minimization with working set of contraints Identify argmaxy∈Y ∆(y, yi ) + w, ϕ(xi , y) Add violated constraints to working set and iterate • Polynomial time convergence to any precision ε • Similar to bootstrap training, but with a margin.
  • 38. Evaluation: PASCAL VOC 2006 Example detections for VOC 2006 bicycle, bus and cat. Precision–recall curves for VOC 2006 bicycle, bus and cat. • Structured regression improves detection accuracy. • New best scores (at that time) in 6 of 10 classes.
  • 39. Why does it work? Learned weights from binary (center) and structured training (right). • Both methods assign positive weights to object region. • Structured training also assigns negative weights to features surrounding the bounding box position. • Posterior distribution over box coordinates becomes more peaked.
  • 40. More Recent Results (PASCAL VOC 2009) aeroplane
  • 41. More Recent Results (PASCAL VOC 2009) bicycle
  • 42. More Recent Results (PASCAL VOC 2009) bird
  • 43. More Recent Results (PASCAL VOC 2009) boat
  • 44. More Recent Results (PASCAL VOC 2009) bottle
  • 45. More Recent Results (PASCAL VOC 2009) bus
  • 46. More Recent Results (PASCAL VOC 2009) car
  • 47. More Recent Results (PASCAL VOC 2009) cat
  • 48. More Recent Results (PASCAL VOC 2009) chair
  • 49. More Recent Results (PASCAL VOC 2009) cow
  • 50. More Recent Results (PASCAL VOC 2009) diningtable
  • 51. More Recent Results (PASCAL VOC 2009) dog
  • 52. More Recent Results (PASCAL VOC 2009) horse
  • 53. More Recent Results (PASCAL VOC 2009) motorbike
  • 54. More Recent Results (PASCAL VOC 2009) person
  • 55. More Recent Results (PASCAL VOC 2009) pottedplant
  • 56. More Recent Results (PASCAL VOC 2009) sheep
  • 57. More Recent Results (PASCAL VOC 2009) sofa
  • 58. More Recent Results (PASCAL VOC 2009) train
  • 59. More Recent Results (PASCAL VOC 2009) tvmonitor
  • 60. Extensions: Image segmentation with connectedness constraint: CRF segmentation connected CRF segmentation • S. Nowozin, C.L.: Global Connectivity Potentials for Random Field Models, CVPR 2009.
  • 61. Summary Object Localization is a step towards image interpretation. Conceptual approach instead of algorithmic: • Branch-and-bound evaluation: don’t slide a window, but solve an argmax problem, ⇒ higher efficiency • Structured regression training: solve the prediction problem, not a classification proxy. ⇒ higher localization accuracy • Modular and kernelized: easily adapted to other problems/representations, e.g. image segmentations