SlideShare uma empresa Scribd logo
1 de 68
Bag-of-Words models

                  Computer Vision
                     Lecture 8


Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka
Bag-of-features models
Overview: Bag-of-features models
• Origins and motivation
• Image representation
• Discriminative methods
  – Nearest-neighbor classification
  – Support vector machines
• Generative methods
  – Naïve Bayes
  – Probabilistic Latent Semantic Analysis
• Extensions: incorporating spatial information
Origin 1: Texture recognition
• Texture is characterized by the repetition of basic
  elements or textons
• For stochastic textures, it is the identity of the textons,
  not their spatial arrangement, that matters




Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid
2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 1: Texture recognition
                                                        histogram



                                                Universal texton dictionary




Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid
2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
               http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
               http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
               http://chir.ag/phernalia/preztags/
Bags of features for image
              classification

1. Extract features
Bags of features for image
              classification

1. Extract features
2. Learn “visual vocabulary”
Bags of features for image
              classification

1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bags of features for image
                classification
1.   Extract features
2.   Learn “visual vocabulary”
3.   Quantize features using visual vocabulary
4.   Represent images by frequencies of
     “visual words”
1. Feature extraction
• Regular grid
   – Vogel & Schiele, 2003
   – Fei-Fei & Perona, 2005
1. Feature extraction
• Regular grid
   – Vogel & Schiele, 2003
   – Fei-Fei & Perona, 2005
• Interest point detector
   – Csurka et al. 2004
   – Fei-Fei & Perona, 2005
   – Sivic et al. 2005
1. Feature extraction
• Regular grid
   – Vogel & Schiele, 2003
   – Fei-Fei & Perona, 2005
• Interest point detector
   – Csurka et al. 2004
   – Fei-Fei & Perona, 2005
   – Sivic et al. 2005
• Other methods
   – Random sampling (Vidal-Naquet & Ullman, 2002)
   – Segmentation-based patches (Barnard et al. 2003)
1. Feature extraction



Compute SIFT
 descriptor    Normalize patch
 [Lowe’99]


                                                Detect patches
                                 [Mikojaczyk and Schmid ’02]
                                 [Mata, Chum, Urban & Pajdla, ’02]
                                 [Sivic & Zisserman, ’03]




                                                               Slide credit: Josef Sivic
1. Feature extraction

      …
2. Learning the visual vocabulary

            …
2. Learning the visual vocabulary

            …




                           Clustering



                             Slide credit: Josef Sivic
2. Learning the visual vocabulary
                         Visual vocabulary

            …




                            Clustering



                               Slide credit: Josef Sivic
K-means clustering
• Want to minimize sum of squared
  Euclidean distances between points xi and
  their nearest cluster centers mk
          D( X , M ) =     ∑          ∑ ( xi − mk ) 2
                         cluster k point i in
                                   cluster k


• Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
  – Assign each data point to the nearest center
  – Recompute each cluster center as the mean of
    all points assigned to it
From clustering to vector quantization
• Clustering is a common method for learning a visual
  vocabulary or codebook
   – Unsupervised learning process
   – Each cluster center produced by k-means becomes a
     codevector
   – Codebook can be learned on separate training set
   – Provided the training set is sufficiently representative,
     the codebook will be “universal”

• The codebook is used for quantizing features
   – A vector quantizer takes a feature vector and maps it
     to the index of the nearest codevector in a codebook
   – Codebook = visual vocabulary
   – Codevector = visual word
Example visual vocabulary




                            Fei-Fei et al. 2005
Image patch examples of visual words




                                 Sivic et al. 2005
Visual vocabularies: Issues
• How to choose vocabulary size?
  – Too small: visual words not representative of all
    patches
  – Too large: quantization artifacts, overfitting
• Generative or discriminative learning?
• Computational efficiency
  – Vocabulary trees
    (Nister & Stewenius, 2006)
3. Image representation
frequency




                              …..
                  codewords
Image classification
• Given the bag-of-features representations of
  images from different classes, how do we
  learn a model for distinguishing them?
Discriminative and generative methods for
             bags of features

                       Zebra
                       Non-zebra
Discriminative methods
• Learn a decision rule (classifier) assigning bag-
  of-features representations of images to
  different classes
      Decision                Zebra
      boundary
                              Non-zebra
Classification
• Assign input vector to one of two or more
  classes
• Any decision rule divides input space into
  decision regions separated by decision
  boundaries
Nearest Neighbor Classifier
• Assign label of nearest training data point
  to each test data point




      from Duda et al.


                    Voronoi partitioning of feature space
                     for two-category 2D and 3D data        Source: D. Lowe
K-Nearest Neighbors
• For a new point, find the k closest points from training
  data
• Labels of the k points “vote” to classify
• Works well provided there is lots of data and the distance
  function is good
                                   k=5




                                                    Source: D. Lowe
Functions for comparing histograms
• L1 distance                   N
                D(h1 , h2 ) = ∑ | h1 (i ) − h2 (i ) |
                               i =1




                D(h1 , h2 ) = ∑
                                      N
                                           ( h1 (i) − h2 (i) ) 2
• χ2 distance
                                    i =1     h1 (i ) + h2 (i )
Linear classifiers
• Find linear function (hyperplane) to separate
  positive and negative examples
                           x i positive :   xi ⋅ w + b ≥ 0
                           x i negative :   xi ⋅ w + b < 0




                                    Which hyperplane
                                        is best?



                                             Slide: S. Lazebnik
Support vector machines
    • Find hyperplane that maximizes the margin
      between the positive and negative examples




C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Support vector machines
    • Find hyperplane that maximizes the margin
      between the positive and negative examples
                                                 x i positive ( yi = 1) :     xi ⋅ w + b ≥ 1
                                                 x i negative ( yi = −1) :    x i ⋅ w + b ≤ −1

                                                 For support vectors,        x i ⋅ w + b = ±1

                                                 Distance between point          | xi ⋅ w + b |
                                                 and hyperplane:                     || w ||

                                                 Therefore, the margin is 2 / ||w||
 Support vectors                      Margin

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Finding the maximum margin
                          hyperplane
    1. Maximize margin 2/||w||
    2. Correctly classify all training data:
             x i positive ( yi = 1) :          xi ⋅ w + b ≥ 1
             x i negative ( yi = −1) :         x i ⋅ w + b ≤ −1




C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Nonlinear SVMs
• Datasets that are linearly separable work out great:

                           0                  x



• But what if the dataset is just too hard?
                            0                 x



• We can map it to a higher-dimensional space:
                                x2




                               0              x   Slide credit: Andrew Moore
Nonlinear SVMs
• General idea: the original input space can
  always be mapped to some higher-
  dimensional feature space where the training
  set is separable:

                   Φ: x → φ(x)




                                     Slide credit: Andrew Moore
Nonlinear SVMs
    • The kernel trick: instead of explicitly
      computing the lifting transformation φ(x),
      define a kernel function K such that
                    K(xi, xj) = φ(xi ) · φ(xj)
    • (to be valid, the kernel function must satisfy
      Mercer’s condition)
    • This gives a nonlinear decision boundary in
      the original feature space:
                               ∑ α y K ( x , x) + b
                                 i
                                      i   i      i

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Kernels for bags of features
     • Histogram intersection kernel:
                                              N
                            I (h1 , h2 ) = ∑ min(h1 (i ), h2 (i ))
                                             i =1



     • Generalized Gaussian kernel:
                                   1            2
                K (h1 , h2 ) = exp − D(h1 , h2 ) 
                                   A             
     • D can be Euclidean distance, χ2 distance

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation
of Texture and Object Categories: A Comprehensive Study, IJCV 2007
What about multi-class SVMs?
• Unfortunately, there is no “definitive” multi-class SVM
  formulation
• In practice, we have to obtain a multi-class SVM by
  combining multiple two-class SVMs
• One vs. others
   – Traning: learn an SVM for each class vs. the others
   – Testing: apply each SVM to test example and assign to it the
     class of the SVM that returns the highest decision value
• One vs. one
   – Training: learn an SVM for each pair of classes
   – Testing: each learned SVM “votes” for a class to assign to the
     test example



                                                         Slide: S. Lazebnik
SVMs: Pros and cons
• Pros
  – Many publicly available SVM packages:
    http://www.kernel-machines.org/software
  – Kernel-based framework is very powerful, flexible
  – SVMs work very well in practice, even with very
    small training sample sizes

• Cons
  – No “direct” multi-class SVM, must combine two-
    class SVMs
  – Computation, memory
     • During training time, must compute matrix of kernel
       values for every pair of examples
     • Learning can take a very long time for large-scale
       problems

                                                    Slide: S. Lazebnik
Summary: Discriminative methods
• Nearest-neighbor and k-nearest-neighbor classifiers
   – L1 distance, χ2 distance, quadratic distance,

   – Support vector machines
   – Linear classifiers
   – Margin maximization
   – The kernel trick
   – Kernel functions: histogram intersection, generalized
     Gaussian, pyramid match
   – Multi-class
• Of course, there are many other classifiers out there
   – Neural networks, boosting, decision trees, …
Generative learning methods for bags of features

• Model the probability of a bag of features
  given a class
Generative methods
• We will cover two models, both inspired by
  text document analysis:
  – Naïve Bayes
  – Probabilistic Latent Semantic Analysis
The Naïve Bayes model
• Assume that each feature is conditionally
  independent given the class

                         N
 p ( f1 ,  , f N | c) = ∏ p ( f i | c)
                        i =1

     fi: ith feature in the image
     N: number of features in the image




                                              Csurka et al. 2004
The Naïve Bayes model
• Assume that each feature is conditionally
  independent given the class
                       N                M
 p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c)
                                                          n(w j )

                       i =1             j =1


     fi: ith feature in the image
     N: number of features in the image
      wj: jth visual word in the vocabulary
      M: size of visual vocabulary
      n(wj): number of features of type wj in the image
                                                   Csurka et al. 2004
The Naïve Bayes model
• Assume that each feature is conditionally
  independent given the class
                             N                          M
   p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c)
                                                                               n(w j )

                             i =1                       j =1




                 No. of features of type wj in training images of class c
p(wj | c) =
                   Total no. of features in training images of class c




                                                                         Csurka et al. 2004
The Naïve Bayes model
• Maximum A Posteriori decision:

                          M
   c* = arg max c p (c)∏ p ( w j | c)
                                        n(w j )

                          j =1
                                 M
      = arg max c log p (c) + ∑ n( w j ) log p ( w j | c)
                                 j =1


   (you should compute the log of the likelihood
   instead of the likelihood itself in order to avoid
   underflow)
                                                   Csurka et al. 2004
The Naïve Bayes model

• “Graphical model”:




           c                 w
                                       N

                                           Csurka et al. 2004
Probabilistic Latent Semantic Analysis




            = p1                      + p2                      + p3

Image                   zebra                      grass               tree




                                              “visual topics”

        T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
Probabilistic Latent Semantic Analysis
• Unsupervised technique
• Two-level generative model: a document is a
  mixture of topics, and each topic has its own
  characteristic word distribution


                    d              z          w


                document         topic      word
                                P(z|d)      P(w|z)


      T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
Probabilistic Latent Semantic Analysis
• Unsupervised technique
• Two-level generative model: a document is a
  mixture of topics, and each topic has its own
  characteristic word distribution


                    d              z          w


                             K
        p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                            k =1

      T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
The pLSA model
                        K
  p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                        k =1


Probability of word i          Probability of   Probability of
   in document j                word i given    topic k given
      (known)                     topic k        document j
                                (unknown)        (unknown)
The pLSA model
                                      K
               p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                                      k =1
         documents                           topics                documents
words




                              words




                                                          topics
                                                                          p(zk|dj)


                p(wi|dj)
                              =                p(wi|zk)




        Observed codeword     Codeword distributions                Class distributions
           distributions         per topic (class)                      per image
              (M×N)                   (M×K)                                (K×N)
Learning pLSA parameters
Maximize likelihood of data:
                                 Observed counts of
                                 word i in document j




       M … number of codewords
       N … number of images




                                            Slide credit: Josef Sivic
Inference
• Finding the most likely topic (class) for an image:

              ∗
             z = arg max p ( z | d )
                        z
Inference
  • Finding the most likely topic (class) for an image:

                 ∗
                z = arg max p ( z | d )
                               z


• Finding the most likely topic (class) for a visual
  word in a given image:

                 z ∗ = arg max p ( z | w, d )
                           z
Topic discovery in images




J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location
in Images, ICCV 2005
Application of pLSA: Action recognition
                                  Space-time interest points




Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action
Categories Using Spatial-Temporal Words, IJCV 2008.
Application of pLSA: Action recognition




Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action
Categories Using Spatial-Temporal Words, IJCV 2008.
pLSA model
                          K
    p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                          k =1


  Probability of word i          Probability of   Probability of
       in video j                 word i given    topic k given
        (known)                     topic k          video j
                                  (unknown)        (unknown)

– wi = spatial-temporal word
– dj = video
– n(wi, dj) = co-occurrence table
  (# of occurrences of word wi in video dj)
– z = topic, corresponding to an action
Action recognition example
Multiple Actions
Multiple Actions
Summary: Generative models
• Naïve Bayes
  – Unigram models in document analysis
  – Assumes conditional independence of words given
    class
  – Parameter estimation: frequency counting
• Probabilistic Latent Semantic Analysis
  – Unsupervised technique
  – Each document is a mixture of topics (image is a
    mixture of classes)
  – Can be thought of as matrix decomposition

Mais conteúdo relacionado

Mais procurados

MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative modelszukun
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detectionzukun
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferCharles Deledalle
 
Ontology Based Object Learning and Recognition
Ontology Based Object Learning and RecognitionOntology Based Object Learning and Recognition
Ontology Based Object Learning and Recognitionnmaillot
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robotss1240148
 
Computer Vision harris
Computer Vision harrisComputer Vision harris
Computer Vision harrisWael Badawy
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videosInsaf Setitra
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2zukun
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2Prarinya Siritanawan
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 

Mais procurados (20)

MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative models
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detection
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
 
Ontology Based Object Learning and Recognition
Ontology Based Object Learning and RecognitionOntology Based Object Learning and Recognition
Ontology Based Object Learning and Recognition
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robots
 
Blurclassification
BlurclassificationBlurclassification
Blurclassification
 
Computer Vision harris
Computer Vision harrisComputer Vision harris
Computer Vision harris
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videos
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2
 
BMC 2012
BMC 2012BMC 2012
BMC 2012
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
56 58
56 5856 58
56 58
 

Semelhante a 16 17 bag_words

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_featuresBo Li
 
Machine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine LearninggMachine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine Learninggghsskchutta
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningkkkc
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetectionJie Feng
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)Anton Kasyanov
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingcairo university
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 

Semelhante a 16 17 bag_words (20)

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
MAchine learning
MAchine learningMAchine learning
MAchine learning
 
PPT-3.ppt
PPT-3.pptPPT-3.ppt
PPT-3.ppt
 
Machine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine LearninggMachine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine Learningg
 
17.ppt
17.ppt17.ppt
17.ppt
 
Part2
Part2Part2
Part2
 
PPT s12-machine vision-s2
PPT s12-machine vision-s2PPT s12-machine vision-s2
PPT s12-machine vision-s2
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetection
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matching
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
Lec10 alignment
Lec10 alignmentLec10 alignment
Lec10 alignment
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 

Último

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

16 17 bag_words

  • 1. Bag-of-Words models Computer Vision Lecture 8 Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka
  • 3. Overview: Bag-of-features models • Origins and motivation • Image representation • Discriminative methods – Nearest-neighbor classification – Support vector machines • Generative methods – Naïve Bayes – Probabilistic Latent Semantic Analysis • Extensions: incorporating spatial information
  • 4. Origin 1: Texture recognition • Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
  • 5. Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
  • 6. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • 7. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 8. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 9. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 10. Bags of features for image classification 1. Extract features
  • 11. Bags of features for image classification 1. Extract features 2. Learn “visual vocabulary”
  • 12. Bags of features for image classification 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary
  • 13. Bags of features for image classification 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words”
  • 14. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005
  • 15. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005 • Interest point detector – Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005
  • 16. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005 • Interest point detector – Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005 • Other methods – Random sampling (Vidal-Naquet & Ullman, 2002) – Segmentation-based patches (Barnard et al. 2003)
  • 17. 1. Feature extraction Compute SIFT descriptor Normalize patch [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic
  • 19. 2. Learning the visual vocabulary …
  • 20. 2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic
  • 21. 2. Learning the visual vocabulary Visual vocabulary … Clustering Slide credit: Josef Sivic
  • 22. K-means clustering • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk D( X , M ) = ∑ ∑ ( xi − mk ) 2 cluster k point i in cluster k • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: – Assign each data point to the nearest center – Recompute each cluster center as the mean of all points assigned to it
  • 23. From clustering to vector quantization • Clustering is a common method for learning a visual vocabulary or codebook – Unsupervised learning process – Each cluster center produced by k-means becomes a codevector – Codebook can be learned on separate training set – Provided the training set is sufficiently representative, the codebook will be “universal” • The codebook is used for quantizing features – A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook – Codebook = visual vocabulary – Codevector = visual word
  • 24. Example visual vocabulary Fei-Fei et al. 2005
  • 25. Image patch examples of visual words Sivic et al. 2005
  • 26. Visual vocabularies: Issues • How to choose vocabulary size? – Too small: visual words not representative of all patches – Too large: quantization artifacts, overfitting • Generative or discriminative learning? • Computational efficiency – Vocabulary trees (Nister & Stewenius, 2006)
  • 28. Image classification • Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
  • 29. Discriminative and generative methods for bags of features Zebra Non-zebra
  • 30. Discriminative methods • Learn a decision rule (classifier) assigning bag- of-features representations of images to different classes Decision Zebra boundary Non-zebra
  • 31. Classification • Assign input vector to one of two or more classes • Any decision rule divides input space into decision regions separated by decision boundaries
  • 32. Nearest Neighbor Classifier • Assign label of nearest training data point to each test data point from Duda et al. Voronoi partitioning of feature space for two-category 2D and 3D data Source: D. Lowe
  • 33. K-Nearest Neighbors • For a new point, find the k closest points from training data • Labels of the k points “vote” to classify • Works well provided there is lots of data and the distance function is good k=5 Source: D. Lowe
  • 34. Functions for comparing histograms • L1 distance N D(h1 , h2 ) = ∑ | h1 (i ) − h2 (i ) | i =1 D(h1 , h2 ) = ∑ N ( h1 (i) − h2 (i) ) 2 • χ2 distance i =1 h1 (i ) + h2 (i )
  • 35. Linear classifiers • Find linear function (hyperplane) to separate positive and negative examples x i positive : xi ⋅ w + b ≥ 0 x i negative : xi ⋅ w + b < 0 Which hyperplane is best? Slide: S. Lazebnik
  • 36. Support vector machines • Find hyperplane that maximizes the margin between the positive and negative examples C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 37. Support vector machines • Find hyperplane that maximizes the margin between the positive and negative examples x i positive ( yi = 1) : xi ⋅ w + b ≥ 1 x i negative ( yi = −1) : x i ⋅ w + b ≤ −1 For support vectors, x i ⋅ w + b = ±1 Distance between point | xi ⋅ w + b | and hyperplane: || w || Therefore, the margin is 2 / ||w|| Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 38. Finding the maximum margin hyperplane 1. Maximize margin 2/||w|| 2. Correctly classify all training data: x i positive ( yi = 1) : xi ⋅ w + b ≥ 1 x i negative ( yi = −1) : x i ⋅ w + b ≤ −1 C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 39. Nonlinear SVMs • Datasets that are linearly separable work out great: 0 x • But what if the dataset is just too hard? 0 x • We can map it to a higher-dimensional space: x2 0 x Slide credit: Andrew Moore
  • 40. Nonlinear SVMs • General idea: the original input space can always be mapped to some higher- dimensional feature space where the training set is separable: Φ: x → φ(x) Slide credit: Andrew Moore
  • 41. Nonlinear SVMs • The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that K(xi, xj) = φ(xi ) · φ(xj) • (to be valid, the kernel function must satisfy Mercer’s condition) • This gives a nonlinear decision boundary in the original feature space: ∑ α y K ( x , x) + b i i i i C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 42. Kernels for bags of features • Histogram intersection kernel: N I (h1 , h2 ) = ∑ min(h1 (i ), h2 (i )) i =1 • Generalized Gaussian kernel:  1 2 K (h1 , h2 ) = exp − D(h1 , h2 )   A  • D can be Euclidean distance, χ2 distance J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007
  • 43. What about multi-class SVMs? • Unfortunately, there is no “definitive” multi-class SVM formulation • In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs • One vs. others – Traning: learn an SVM for each class vs. the others – Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value • One vs. one – Training: learn an SVM for each pair of classes – Testing: each learned SVM “votes” for a class to assign to the test example Slide: S. Lazebnik
  • 44. SVMs: Pros and cons • Pros – Many publicly available SVM packages: http://www.kernel-machines.org/software – Kernel-based framework is very powerful, flexible – SVMs work very well in practice, even with very small training sample sizes • Cons – No “direct” multi-class SVM, must combine two- class SVMs – Computation, memory • During training time, must compute matrix of kernel values for every pair of examples • Learning can take a very long time for large-scale problems Slide: S. Lazebnik
  • 45. Summary: Discriminative methods • Nearest-neighbor and k-nearest-neighbor classifiers – L1 distance, χ2 distance, quadratic distance, – Support vector machines – Linear classifiers – Margin maximization – The kernel trick – Kernel functions: histogram intersection, generalized Gaussian, pyramid match – Multi-class • Of course, there are many other classifiers out there – Neural networks, boosting, decision trees, …
  • 46. Generative learning methods for bags of features • Model the probability of a bag of features given a class
  • 47. Generative methods • We will cover two models, both inspired by text document analysis: – Naïve Bayes – Probabilistic Latent Semantic Analysis
  • 48. The Naïve Bayes model • Assume that each feature is conditionally independent given the class N p ( f1 ,  , f N | c) = ∏ p ( f i | c) i =1 fi: ith feature in the image N: number of features in the image Csurka et al. 2004
  • 49. The Naïve Bayes model • Assume that each feature is conditionally independent given the class N M p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c) n(w j ) i =1 j =1 fi: ith feature in the image N: number of features in the image wj: jth visual word in the vocabulary M: size of visual vocabulary n(wj): number of features of type wj in the image Csurka et al. 2004
  • 50. The Naïve Bayes model • Assume that each feature is conditionally independent given the class N M p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c) n(w j ) i =1 j =1 No. of features of type wj in training images of class c p(wj | c) = Total no. of features in training images of class c Csurka et al. 2004
  • 51. The Naïve Bayes model • Maximum A Posteriori decision: M c* = arg max c p (c)∏ p ( w j | c) n(w j ) j =1 M = arg max c log p (c) + ∑ n( w j ) log p ( w j | c) j =1 (you should compute the log of the likelihood instead of the likelihood itself in order to avoid underflow) Csurka et al. 2004
  • 52. The Naïve Bayes model • “Graphical model”: c w N Csurka et al. 2004
  • 53. Probabilistic Latent Semantic Analysis = p1 + p2 + p3 Image zebra grass tree “visual topics” T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
  • 54. Probabilistic Latent Semantic Analysis • Unsupervised technique • Two-level generative model: a document is a mixture of topics, and each topic has its own characteristic word distribution d z w document topic word P(z|d) P(w|z) T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
  • 55. Probabilistic Latent Semantic Analysis • Unsupervised technique • Two-level generative model: a document is a mixture of topics, and each topic has its own characteristic word distribution d z w K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
  • 56. The pLSA model K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 Probability of word i Probability of Probability of in document j word i given topic k given (known) topic k document j (unknown) (unknown)
  • 57. The pLSA model K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 documents topics documents words words topics p(zk|dj) p(wi|dj) = p(wi|zk) Observed codeword Codeword distributions Class distributions distributions per topic (class) per image (M×N) (M×K) (K×N)
  • 58. Learning pLSA parameters Maximize likelihood of data: Observed counts of word i in document j M … number of codewords N … number of images Slide credit: Josef Sivic
  • 59. Inference • Finding the most likely topic (class) for an image: ∗ z = arg max p ( z | d ) z
  • 60. Inference • Finding the most likely topic (class) for an image: ∗ z = arg max p ( z | d ) z • Finding the most likely topic (class) for a visual word in a given image: z ∗ = arg max p ( z | w, d ) z
  • 61. Topic discovery in images J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location in Images, ICCV 2005
  • 62. Application of pLSA: Action recognition Space-time interest points Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
  • 63. Application of pLSA: Action recognition Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
  • 64. pLSA model K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 Probability of word i Probability of Probability of in video j word i given topic k given (known) topic k video j (unknown) (unknown) – wi = spatial-temporal word – dj = video – n(wi, dj) = co-occurrence table (# of occurrences of word wi in video dj) – z = topic, corresponding to an action
  • 68. Summary: Generative models • Naïve Bayes – Unigram models in document analysis – Assumes conditional independence of words given class – Parameter estimation: frequency counting • Probabilistic Latent Semantic Analysis – Unsupervised technique – Each document is a mixture of topics (image is a mixture of classes) – Can be thought of as matrix decomposition

Notas do Editor

  1. For that we will use a method called probabilistic latent semantic analysis. pLSA can be thought of as a matrix decomposition. Here is our term-document matrix (documents, words) and we want to find topic vectors common to all documents and mixture coefficients P(z|d) specific to each document. Note that these are all probabilites which sum to one. So a column here is here expressed as a convex combination of the topic vectors. What we would like to see is that the topics correspond to objects. So an image will be expressed as a mixture of different objects and backgrounds.
  2. For that we will use a method called probabilistic latent semantic analysis. pLSA can be thought of as a matrix decomposition. Here is our term-document matrix (documents, words) and we want to find topic vectors common to all documents and mixture coefficients P(z|d) specific to each document. Note that these are all probabilites which sum to one. So a column here is here expressed as a convex combination of the topic vectors. What we would like to see is that the topics correspond to objects. So an image will be expressed as a mixture of different objects and backgrounds.
  3. We use maximum likelihood approach to fit parameters of the model. We write down the likelihood of the whole collection and maximize it using EM. M,N goes over all visual words and all documents. P(w|d) factorizes as on previous slide the entries in those matrices are our parameters. n(w,d) are the observed counts in our data
  4. We use maximum likelihood approach to fit parameters of the model. We write down the likelihood of the whole collection and maximize it using EM. M,N goes over all visual words and all documents. P(w|d) factorizes as on previous slide the entries in those matrices are our parameters. n(w,d) are the observed counts in our data
  5. We use maximum likelihood approach to fit parameters of the model. We write down the likelihood of the whole collection and maximize it using EM. M,N goes over all visual words and all documents. P(w|d) factorizes as on previous slide the entries in those matrices are our parameters. n(w,d) are the observed counts in our data