AcademicProject

VIDEO SHOT BOUNDARY DETECTION
B.Tech Project Report
BY
Anveshkumar Kolluri(1210711204)
DEPARTMENT OF INFORMATION TECHNOLOGY
GITAM INSTITUTE OF TECHNOLOGY
GITAM UNIVERSITY
VISAKHAPATNAM
530045 ,AP(INDIA)
April, 2015

CERTIFICATE
I hereby certify that the work which is being presented in the B.Tech
Major Project Report entitled “VIDEO SHOT BOUNDARY DETECTION”, in
partial fulfilment of the requirements of award of the Bachelor of Technology in
INFORMATION TECHNOLOGY and submitted to the Department of
Information Technology of GITAM Institute Of Technology, GITAM University,
Visakhapatnam, A.P is an academic record of my own work carried out during a
period from September 2014 to March 2015 under the supervision of Sri
D.Kishore Kumar, Assistant Professor, IT Department.
The matter presented in this thesis has not been submitted by me for the
award of any other degree elsewhere.
Signature of Candidate
Anveshkumar Kolluri
Rollno: 1210711204
This is to certify that the above statement made by the candidate is
correct to the best of my knowledge.
Signature of Supervisor
Date: Sri D.KishoreKumar,
Assistant Professor,
Project Supervisor

ACKNOWLEDGEMENT
We acknowledge the efforts of our guide Sri D.Kishore Kumar for his
very able guidance and immense help during the thick and thin of this project. His
inspiration and perfective criticism was extremely helpful in the successful working
of this project. We were continuously assessed and recorded on every module under
the guidance of Dr.GVS.Rajkumar and Mr.A.Naresh. There support has been
immeasurable.
We acknowledge the help and superlative presence of the Head of
Department of Information Technology, Dr. P V Lakshmi for the opportunities
she has given us not only during the project but throughout the course of B.Tech.
I am greatly indebted to Dr. K. Lakshmi Prasad, principal, GITAM Institute of
Technology, for providing facilities to carry out this work.
We also duly acknowledge all the faculty members of Information
Technology department for guiding us in the making of this project and for solving
our problems whenever they surfaced and also for catalyzing our thinking when it
seemed to stagnate.
Anvesh Kumar Kolluri (1210711204)

ABSTRACT
Indexing of digital videos in order to support browsing and retrieval by
users is a great challenge being faced today. Hence, to design a system that can
accurately and automatically process large amounts of heterogeneous videos, their
segmentation into shots and scene forms the basic operation, and for this operation
first shot boundaries have to be detected. Due to the enormous increase in videos
and the database sizes, as well as its vast deployment in various applications, the
need for a good retrieval application development arose. This project aims to
introduce an efficient and useful video processing and retrieval technique. This
report aims to present a detailed analysis of HOG(Histogram of Oriented Analysis)
in order to process the videos and retrieve relevant information. This method
utilizes the benefits of edge detection together with pixel intensity based technique
making a transformation in the size of pixel without much loss in their properties
and searching the relevant information. Moreover a new concept involving the
HOG method has been successfully applied to the shot boundary detection
problem.
CONTENTS

CHAPTER NAME PAGE NO
1 INTRODUCTION
1.1 MOTIVATION
1.2 FUNDAMENTALS OF SBD
1.2.1 THRESHOLDS
1.2.2 OBJECT OR CAMERA MOTION
1.2.3 FLASH LIGHTS
1.2.4 COMPLEXITY OF DETECTOR
1.3 PROBLEM FORMULATION
1
3
3
3
4
4
4
2 LITERATURE SURVEY
2.1 GRADIENTFIELD DESCRIPTOR FOR IMAGERETRIEVAL
2.2 FEATURE EXTRACTORS
2.2.1 EDGE HISTOGRAM DESCRIPTOR
2.2.2 SCALE INVARIANT FEATURE TRANSFORM
2.2.3 HISTOGRAM OF ORIENTED GRADIENTS
6
10
10
11
12
13
3 SYSTEM DESIGN
3.1 PROBLEM STATEMENT
3.2 EXISTING SYSTEMS
3.3 PROBLEM MOTIVATION
3.4 PROBLEM SOLUTION
3.5 SYSTEM REQUIREMENT ANALYSIS
3.6 SYSTEM REQUIREMENTS
15
15
16
16
17
18
4. MODULES
4.1 GUI MODULE
4.2 QUERY MODULE
4.3 FEATURE EXTRACTION MODULE
4.4 IMAGE RETRIEVAL MODULE
20
20
22
23
5. SELECTED SOFTWARE
5.1 MATLAB AND ITS TOOLBOX 24

5.2 FEATURES
5.2.1 COLOR
5.2.2 TEXTURE
5.2.3 SHAPE
30
30
32
36
6 .SYSTEM DESIGN
6.1 INTRODUTION
6.2 DATA FLOW DIAGRAM
38
38
7. IMPLEMENTATION
7.1 SAMPLE CODE
7.2 SAMPLE_GUI.M & SIMULATION RESULTS
42
45
8. SYSTEM TESTING
8.1 TESTING
8.2 TYPES OF TESTING
8.3 TEST CASES
49
49
52
9. CONCLUSIONS 53
10.REFERENCES
APPENDIX-I
APPENDIX-II
54
55
87

List of Abbrevations
SBD Shot Boundary Detection
SIFT Scale Invariant Feature Transform
EHD Edge Histogram Descriptor
HOG Histogram of Oriented Gradients
LT Local Threshold
GT Global Threshold

List of Figures
Figure 1 Sequence of frames 1
Figure 2 Types of histogram descriptors 11
Figure 3 Definition of images and sub-image
blocks
12
Figure 4 Hog blocks 14
Figure 5 An image and its histogram 31
Figure 6 Texture properties 32
Figure 7 Classical co-occurence matrix 34
Figure 8 Boundary-based & region-based 34
Figure 9 Data flow diagram of video shot
boundary detection
36

1
INTRODUCTION
1.1 Motivation
In this digital era, the recent developments in video compression
technology, widespread use of digital cameras, high capacity digital systems, along
with significant increase in computer performance have increased the usage and
availability of digital video, creating a need for tools that effectively categorize,
search and retrieve the relevant video material. Also, the increasing availability and
use of on-line videos has led to a demand for efficient and automated video analysis
techniques.
Generally, management of such large collections of videos requires
knowledge of the content of those videos. Therefore, digital video data is processed
with the objective of extracting the information about the content conveyed in the
video. Now, the definition of content is highly application dependent but there are a
number of commonalities in the application of content analysis. Among others, shot
boundary detection (SBD), also called temporal video segmentation is one of the
important and the most basic aspects of content based video retrieval. Hence, much
research has been focused on segmenting video by detecting the boundaries
between camera shots. We make use of HOG in order to extract the necessary
features across the shot boundary. A shot may be defined as a sequence of frames
captured by a single camera in a single continuous action in time and space.
FIGURE 1: SEQUENCE OF FRAMES
For example, a video sequence showing two people having a conversation may be

2
composed of several close-up shots of their faces which are interleaved and make
up a scene. Shots define the low-level, syntactical building blocks of a video
sequence.
A large number of different types of boundaries can exist between shots. These can
be broadly classified into abrupt changes or gradual changes. An abrupt transition is
basically a hard cut that occurs between two adjacent frames. In gradual transitions
we have fade, which is a gradual change in brightness, either starting or ending
with a black frame. Another type of gradual transition is dissolve which is similar
to a fade except that it occurs between two shots. The images of the first shot get
dimmer and those of the second shot get brighter until the second replaces the first.
Other types of shot transitions include wipes (gradual) and computer generated
effects such as morphing.
A scene is a logical grouping of shots into a semantic unit. A single scene focuses
on a certain object or objects of interest, but the shots constituting a scene can be
from different angles. In the example above the sequence of shots showing the
conversation would comprise one logical scene with the focus being the two people
and their conversation.
Scene boundary detection requires high level semantic understanding of the video
sequence and such an understanding must take cues such as the associated audio
track and the encoded data stream itself. Although, video segmentation is far more
desirable than simple shot boundary detection since people generally visualize a
video as a sequence of scenes not of shots, but shot boundary detection still plays a
vital role in any video segmentation system, as it provides the basic syntactic units
for higher level processes to build upon.
As part of an ongoing video indexing and browsing project, our recent research has
focused on the application of different methods of video segmentation to a large
and diverse digital video collection. The aim is to examine how different
segmentation methods perform on video. With this information, it is hoped to
develop a system capable of accurately segmenting a wide range of videos.

3
1.2 Fundamental Problems of SBD
The problem of shot boundary detection has been studied for more than
a decade and many published methods of detecting shot boundaries exist, several
challenges still exist which have been summarized in the upcoming sections:
1.2.1 Thresholds
Setting a threshold is one of the most challenging tasks for the
correct detection of a shot boundary. To decide whether a shot boundary has
occurred, it is necessary to set a threshold, or thresholds for measuring the
similarities or dissimilarities between start and end frame of a shot boundary. An
abrupt transition has a high discontinuity in adjacent frames, whereas the gradual
transition occurs over a number of shots. So the start frame and end frame for an
abrupt transition are adjacent frames but this will not be the case for gradual
transitions. This makes deciding a threshold more challenging, for gradual
transitions. Cosine dissimilarity values between the start and end frames if above
this threshold are logged as real shot boundaries, while values below this threshold
are ignored.To accurately segment broadcast video, it is necessary to balance
the following two - apparently conflicting points:
1 The need to prevent detection of false shot boundaries i.e. detecting boundaries
where none exist, by setting a sufficiently high threshold level so as to insulate the
detector from noise.
2 The need to detect subtle shot transitions such as dissolves, by making the
detector sensitive enough to recognise gradual change.
1.2.2 Object or Camera Motion
Visual content of the video changes significantly with the extreme object
or camera motion and screenplay effects (e.g. one turns on the light in a dark room)
very similar to the typical shot changes. Sometimes, slow motion cause content
change similar to gradual transitions, whereas extremely fast camera/object
movements cause content change similar to hard cuts. Therefore, it is difficult to

4
differentiate shot changes from the object or camera motion.
1.2.3 Flashlights
Color is the primary element of video content. Most of the video content
representations employ color as a feature. Continuity signals based on color feature
exhibit significant changes under abrupt illumination changes, such as flashlights.
Such a significant change might be identified as a content change (i.e. a shot
boundary) by most of the shot boundary detection tools. Several algorithms propose
using illumination invariant features, but these algorithms always face with a
tradeoff between using an illumination invariant feature and losing the most
significant feature in characterizing the variation of the visual content. Therefore,
flashlight detection is one of the major challenges in SBD algorithms.
1.2.4 Complexity of the Detector
Shot boundary detection is considered as a pre-processing step in most of
the video content analysis applications. There are high level algorithms which
perform more complex content analysis. Shot boundary detection results are used
by these high level analysis algorithms. Since the video content applications takes
most of the available computational power and time, it is necessary to keep the
computational complexity of the shot boundary detector low. Such a need
challenges for algorithms which are sufficiently precise but also computationally
inexpensive.
As the shot boundary detection problem evolved, in order to increase the
performance of the detection, proposed algorithms started to use more than one
feature for content representation. On the other hand, such a strategy brings a
computational burden on the detector, since each feature requires a separate
processing step.
1.3 Problem Formulation
Shot boundary detection is one of the basic, yet very important video
processing tasks, because the success of higher level processing relies heavily on
this step. Normally videos contain enormous amounts of data. For fast and efficient

5
processing and retrieval of these videos they need to be indexed properly, for which
shot boundary detection forms the basic process. This project aims at the evaluation
of the existing detection techniques for the shots in videos and working towards
their enhancement in order to improve the methods so that the problem discussed
above can be removed and the techniques can give better and more accurate results.
The conditions are met by doing the following two steps;
•
Feature Extraction: The first step in the process is extracting image features to a
distinguishable extent.
•
Matching: The second step involves matching these features to yield a result that
is visually similar.

6
2.LITERATURE SURVEY
This chapter concentrate on the work till done in respect of shot boundary
detection. John S. Boreczky et al [1996] proposed Comparison of video shot
boundary detection techniques and present a comparative analysis of various shot
boundary detection techniques and their variations including histograms, discrete
cosine transform, motion vector, and block matching methods. Patrick Bouthemy
et al [1999] proposed Unified Approach to Shot Change Detection and Camera
Motion Characterization which describes an approach to partition a video document
into shots by using image motion information, which is generally more intrinsic to
the video structure itself. A. Miene et al [2001] presented Advanced and Adaptive
Shot Boundary Detection techniques which is based on– feature extraction and shot
boundary detection. First, three different features for the measurement of shot
boundaries within the video are extracted. Second, detection of the shot boundaries
based on the previously extracted features.
H. Y: Mark Liaoff et al [2002] proposed a novel dissolve detection
algorithm which could avoid the mis-detection of motions by using binomial
distribution model to systematically determine the threshold needed for
discriminating a real dissolve from global or local motions. Jesús Bescós [2004],
proposed a Real-Time Shot Change Detection over Online MPEG-2 Video where it
describes a software module for video temporal segmentation which is able to
detect both abrupt transitions and all types of gradual transitions in real time.
Guillermo Cisneros et al [2005] proposed a paper on A Unified Model for
Techniques on Video-Shot Transition Detection. The approach presented here is
centred on mapping the space of inter-frame distances onto a new space of decision
better suited to achieving a sequence independent thresholding. Liuhong Liang et
al[2005] , presented an Enhanced Shot Boundary Detection Using Video Text
Information, in which a number of edge-based techniques have been proposed for
detecting abrupt shot boundaries to avoid the influence of flashlights common in

7
many video types, such as sports, news, entertainment and interviews videos.
Daniel DeMenthon et al [2006] proposed a paper on Shot boundary detection
based on Image correlation features in video. This paper is based on image
correlation features in the videos. The cut detection is based on the so-called 2max
ratio criterion in a sequential image buffer. The dissolve detection is based on the
skipping image difference and linearity error in a sequential image buffer. Kota
Iwamoto et al [2007] proposed Detection of wipes and digital video effects based
on a pattern-independent model of image boundary line characteristics which is
based on a new pattern independent model. These models rely on the characteristics
of image boundary lines dividing the two image regions in the transitional frames.
Jinhui Yuan et al[2008] proposed a paper on a Shot boundary detection
method for news video based on object segmentation and tracking. It combines
three main techniques: the partitioned histogram comparison method, the video
object segmentation and tracking based on wavelet analysis. The partitioned
histogram comparison is used as the first filter to effectively reduce the number of
video frames which need object segmentation and tracking. Yufeng Li et al [2008]
proposed paper on Novel Shot Detection Algorithm Based on Information Theory.
Firstly the features of color and texture are extracted by wavelet transform, then the
dissimilarity between two successive frames are defined which colligates the
mutual information of color feature and the co-occurrence mutual information of
texture feature. The threshold is adjusted adaptive based on the entropy of the
Continuous frames and it is not depend on the type of video and the kind of shot.
Vasileios T. Chasanis et al[2009] presented Scene detection in videos using shot
clustering and sequence alignment Foremost the key-frames are extracted using a
spectral clustering method employing the fast global k-means algorithm in the
clustering phase and also providing an estimate for the number of the key-frames.
Then shots are clustered into groups using only visual similarity as a feature and
they are labelled according to the group they are assigned. Jinchang Ren et al
[2009] proposed a paper on Shot Boundary Detection in MPEG Videos using Local
and Global Indicators operating directly in the compressed domain. Several local

8
indicators are extracted from MPEG macro blocks, and Ada Boost is employed for
feature selection and fusion. The selected features are then used in classifying
candidate cuts into five sub-spaces via pre-filtering and rule based decision making,
then the global indicators of frame similarity between boundary frames of cut
candidates are examined using phase correlation of dc images.
Priyadarshinee Adhikari et al [2009] proposed a paper on Video Shot
Boundary Detection. This paper presents video retrieval using shot boundary
detection. LihongXu et al [2010] proposed a paper on A Novel Shot Detection
Algorithm Based on Clustering. This paper present a novel shot boundary detection
algorithm based on K-means clustering. Colour feature extraction is done first and
then the dissimilarity of video frames is defined. The video frames are divided into
several different sub-clusters through performing K-means clustering. Wenzhu Xu
et al [2010] proposed a paper on A Novel Shot Detection Algorithm Based on
Graph Theory. This paper present a shot boundary detection algorithm based on
graph theory. The video frames are divided into several different groups through
performing graph-theoretical algorithm. Arturo Donate et al[2010] presented Shot
Boundary Detection in Videos Using Robust Three-Dimensional Tracking. The
proposal is to extract salient features from a video sequence and track them over
time in order to estimate shot boundaries within the video.
Min-Ho Park et al[2010] proposed a paper on Efficient Shot Boundary
Detection Using Block wise Motion Based Features. It is a measure of discontinuity
in camera and object/background motion is proposed for SBD based on the
combination of two motion features: the modified displaced frame difference
(DFD) and the block wise motion similarity. Goran J. Zajić et al [2011] proposed
a paper on Video shot boundary detection based on multifractal analysis. Low-level
features (color and texture features) are extracted from each frame in video
sequence then are concatenated in feature vectors (FVs) and stored in feature
matrix. Matrix rows correspond to FVs of frames from video sequence, while
columns are time series of particular FV component. Partha Pratim Mohanta et
al [2012], proposed a paper on A Model Based Shot Boundary Detection

9
Technique Using Frame Transition Parameters which is based on formulated frame
estimation scheme using the previous and the next frames. Pablo Toharia et
al[2012] proposed a paper on Shot boundary detection using Zernike moments in
multi-GPU multi-CPU architectures along with the different possible hybrid
combinations based on Zernike moments. Sandip T et al[2012] proposed a paper
on Key frame Based Video Summarization Using Automatic Threshold & Edge
Matching Rate. Firstly, the Histogram difference of every frame is calculated, and
then the edges of the candidate key frames are extracted by Prewitt operator. Ravi
Mishra et al[2013] proposed a paper on Video shot boundary detection using dual-
tree complex wavelet transform, an approach to process encoded video sequences
prior to complete decoding. The proposed algorithm first extracts structure features
from each video frame by using dualtree complex wavelet transform and then
spatial domain structure similarity is computed between adjacent frames. Zhe Ming
Lu et al [2013] present a Fast Video Shot Boundary Detection Based on SVD and
Pattern Matching. It is based on segment selection and singular value
decomposition (SVD). Initially, the positions of the shot boundaries and lengths of
gradual transitions are predicted using adaptive thresholds and most non-boundary
frames are discarded at the same time. Sowmya R et al [2013] proposed a paper on
Analysis and Verification of Video Summarization using Shot Boundary Detection.
The analysis is based on Block based Histogram difference and Block based
Euclidean distance difference for varying block sizes. Ravi Mishra et al[2014]
proposed a paper on a “Comparative study of block matching algorithm and dual
tree complex wavelet transform for shot detection in videos”. This paper presents a
comparison between the two detection methods in terms of various parameters like
false rate, hit rate, miss rate tested on a set of different video sequence.

10
2.1 GRADIENT FIELD DESCRIPTOR FOR VIDEO SHOT BOUNDARY
DETECTION
This system accepts the images as inputs of any format and
mapping itself into a defined dimensions in order to make it appropriate for
processing. This requires a matching process robust to depictive inaccuracy and
photometric variation. The approach is to transform database images into canny
edge maps and capture local structure in the map using a novel descriptor. Setting
an appropriate scale and hysteresis threshold for the canny operator by searching
the parameter space for a binary edge map in which a small, fixed percent of pixels
are confidential edging. These easy heuristics remove central boundaries and
discourages response at the scale of finer texture. Gradient Field HOG , an
adaptation of HOG that mitigates the lack of relative spatial information within
now by capturing structure from surrounding regions. We are inspired by work on
image completion capable of propagating image structure into voids and use a
similar Poisson filling approach to improve the richness of information in the
gradient field prior to sampling with the HOG descriptor. This simple technique
yields significant improvements in performance when matching features to photos,
compared to three leading descriptors: Self-Similarity Descriptor (SSIM) SIFT and
HOG. Furthermore we show how the descriptor can be applied to localise sketched
objects within the retrieved images, and demonstrate this functionality through the
features of the image montage application. The success of the descriptor is
dependent on correct selection of scale during edge extraction, and use of image
salience measures may benefit this process. The system could be enhanced by
exploring coloured sketches, or incorporate more flexible models.
2.2 FEATURE EXTRACTORS
Usually Feature extractors are divided into 3 types
1. EHD (EDGE HISTOGRAM DISCRIPTOR)
2. SIFT
3. HOG(HISTOGRAM OF ORIENTED GRADIENTS)

11
2.2.1 EHD (EDGE HISTOGRAM DESCRIPTOR)
In EHD method, it show how to made global and semi-global edge histogram bins
from the local histogram bins. Through various possible clusters of sub-images, in
semi-global histograms used 13 patterns.These 13 semiglobal regions
and the whole image space are adopted to define the semi-global and the global
histograms respectively. These extra histogram information can be obtained
directly from the local histogram bins without feature extraction process.
Experimental results show that the semi-global and global histograms generated
from the local histogram bins help to improve the retrieval performance.
FIGURE 2: TYPES OF HISTOGRAM DESCRIPTORS
Local Edge Histogram:
The normative part of the edge histogram descriptor consists of 80 local
edge histogram bins . The semantics of those histogram bins are described in the
following sub-sections. To localize edge distribution to a certain area of the image,
divide the image space into 4x4 sub-images as shown in Figure 1. Then, for each
sub-image, generate an edge histogram to represent edge distribution in the sub-
image. for define different edge types, the sub-image is further divided into small
square blocks called image-blocks

12
FIGURE 3:DEFINITION OF IMAGES AND SUB-IMAGE BLOCKS
Advantages of EHD:
1) Show how to construct global and semi global edge histogram bins from the
local histogram bins.
2) Used 13 patterns for the semi-global histograms.
3) Extra histogram information can be obtained directly from the local
histogram bins without feature extraction process.
4) Semi global and global histogram generated from the local
histogram bins help to improve the retrieval performance.
Disadvantages of EHD:
1) Not provide invariant opposite rotation, Scaling and translation.
2) The development is to difficult and robust descriptor is emphasized.
2.2.2 SIFT
The approach of SIFT feature detection which is used for object
recognition. The invariant features extracted from images can be used to perform
reliable matching between different views of an object or scene. The features have
been shown to be invariant to image rotation and scale and robust across a
substantial range of affine distortion, addition of noise, and change in illumination.
For each an image query, gallery photograph, and each sketch/photo
correspondence in our dictionary, we compute a SIFT feature representation. SIFT
based object matching is a popular method for finding correspondences between
images. Introduced by Lowe, SIFT object matching consists of both a scale

13
invariant interest point detector as well as a feature-based similarity measure. Our
method is not concerned with the interest point detector as well as a feature based
similarity measure. Our method is not concerned with the interest point detection
aspect of the SIFT framework, but instead utilizes only the gradient-based feature
descriptors (known as SIFT-features)
Advantages of SIFT:
1) Perform reliable matching between different views of an object or scene
2) Image rotation and scale and robust across a substantial range
of affine distortion, addition of noise, and change in illumination.
3) The image gradient magnitudes orientations are sampled around
the key point location
Disadvantages of SIFT:
1) Edges are poorly defined and usually hard to detect, but there are
still large numbers of key points can be extracted from typical images.
2) Perform the feature matching even the faces are small
2.2.3 HISTOGRAM OF ORIENTED GRADIENTS
The essential thought behind the Histogram of Oriented Gradient
descriptors is that local object appearance and shape within an image can be
described by the distribution of intensity gradients or edge directions. The
implementation of these descriptors can be achieved by dividing the image into
small connected regions, called cells, and for each cell compiling a histogram of
gradient directions or edge orientations for the pixels within the cell. The
combination of these histograms then represents the descriptor. For improved
accuracy, the local histograms can be contrast-normalized by calculating a measure
of the intensity across a larger region of the image, called a block, and then using
this value to normalize all cells within the block. This normalization results in

14
better invariance to changes in illumination or shadowing
FIGURE 4: HOG BLOCKS
Advantages:
It is examine for more database The HOG in more cases it is much
better than the EHD based retrieval. The edge Histogram descriptor not mainly
look better for information poor sketch, while other case show better result can be
achieve for more detailed this problem can be overcome by the Hog method.It
Capture edge or gradient structure that is very characteristic of local shape.
Disadvantages:
Working on HOG-based detectors that incorporate motion
information using block matching or optical flow fields. Finally, although the
current fixed-template-style detector has proven difficult to beat for fully visible
pedestrians, humans are highly articulated and we believe that including a parts
based model with a greater degree of local spatial invariance
After reviewing existing edge and gradient based descriptors, we show
experimentally that grids of histograms of oriented gradient (HOG) descriptors
significantly outperform existing feature sets for human detection. We study the
influence of each stage of the computation on performance, concluding that fine-
scale gradients, fine orientation binning, relatively coarse spatial binning, and high-
quality local contrast normalisation in overlapping descriptor blocks are all
important for good results

15
3. SYSTEM ANALYSIS
3.1 PROBLEM STATEMENT
The problem involves entering an image as a query and selecting
the pre-judged video so which in turn provides database space by breaking down
into shots and frames. A software application that is designed to employ SBD
techniques in extracting visual properties, and matching them. This is done to
retrieve images in the database that are visually similar to the query image. The
main problem with this application is the size and shape of the input image which
needs to get matched to the size and shape of the database.
3.2 EXISTING SYSTEMS
Several systems currently exist, and are being constantly developed. Examples are
1) QBIC or Query By Image was developed by IBM, Almaden Research Centre,
to allow users to graphically pose and refine queries based on multiple visual
properties such as colour, texture and shape . It supports queries based on input
images, user-constructed sketches, and selected colour and texture patterns
2) VIR Image Engine by Virage Inc., like QBIC, enables image retrieval based on
primitive attributes such as colour, texture and structure. It examines the pixels in
the image and performs an analysis process, deriving image characterisation
features
3) Visual SEEK and Web SEEK were developed by the Department of Electrical
Engineering, Columbia University. Both these systems support colour and spatial
location matching as well as texture matching NeTra was developed by the
Department of Electrical and Computer Engineering, University of California. It
supports colour, shape, spatial layout and texture matching, as well as image
segmentation

16
4) MARS or Multimedia Analysis and Retrieval System was developed by the
Beckman Institute for Advanced Science and Technology, University of Illinois. It
supports colour, spatial layout, texture and shape matching
3.3 Problem Motivation
Video databases and collections can be enormous in size, containing
hundreds, thousands or even millions of frames and shots that corresponding to
each video respectively. The conventional method of image retrieval in a video is
searching for a key features and that would match the descriptive keyword assigned
to the image by a human categoriser
While computationally expensive, the results are far more accurate than
conventional image indexing. Hence, there exists a trade off between accuracy and
computational cost. This trade off decreases as more efficient algorithms are
utilised and increased computational power becomes inexpensive.
3.4. Proposed Solution
The solution initially proposed was to extract the primitive features of a
query image and compare them to those of database images. The image features
under consideration was over varying features. Thus, using matching and
comparison algorithms, the features of one image are compared and matched to the
corresponding features of another image which is generally frame of the video
under consideration in our case. This comparison is performed using shape distance
metrics. In the end, these metrics are performed one after another, so as to retrieve
database images that are similar to the query. The similarity between features was
to be calculated using algorithms used by well known SBD systems.

17
3.5 SOFTWARE REQUIREMENT ANALYSIS
A Software Requirements Specification (SRS) is a complete description of
the behaviour of the system to be developed. It includes a set of use cases that
describe all the interactions the users will have with the software. Use cases are also
known as functional requirements. In addition to use cases, the SRS also contains
non-functional (or supplementary) requirements. Non-functional requirements are
requirements which impose constraints on the design or implementation (such as
performance engineering requirements, quality standards, or design constraints).
Functional Requirements:
In software engineering, a functional requirement defines a
function of a software system or its component. A function is described as a set of
inputs, the behaviour, and outputs. Functional requirements may be calculations,
technical details, data manipulation and processing and other specific functionality
that define what a system is supposed to accomplish. Behavioural requirements
describing all the cases where the system uses the functional requirements are
captured in use cases. Functional requirements are supported by non-functional
requirements (such as performance requirements, security, or reliability). How a
system implements functional requirements is detailed in the system design. In
some cases a requirements analyst generates use cases after gathering and
validating a set of functional requirements. Each use case illustrates behavioural
scenarios through one or more functional requirements. Often, though, an analyst
will begin by eliciting a set of use cases, from which the analyst can derive the
functional requirements that must be implemented to allow a user to perform each
use case.
Non-Functional Requirements: In systems engineering and requirements
engineering, a non-functional requirement is a requirement that specifies criteria

18
that can be used to judge the operation of a system, rather than specific behaviours.
This should be contrasted with functional requirements that define specific
behaviour or functions In general; functional requirements define what a system is
supposed to do whereas non-functional requirements define how a system is
supposed to be. Non-functional requirements are often called qualities of a system.
Other terms for non-functional requirements are "constraints", "quality attributes",
“quality goals" and "quality of service requirements," and "non-behavioural
requirements. “Qualities, that is, non-functional requirements, can be divided into
two main categories:
1. Execution qualities, such as security and usability, which are observable at run
time.
2. Evolution qualities, such as testability, maintainability, extensibility and
scalability, which are embodied in the static structure of the software system.
3.6 System Requirements
Introduction:
To be used efficiently, all computer software needs certain
hardware components or other software resources to be present on a computer.
These pre-requisites are known as (computer) system requirements and are often
used as a guideline as opposed to an absolute rule. Most software defines two sets
of system requirements: minimum and recommended. With increasing demand for
higher processing power and resources in newer versions of software, system
requirements tend to increase over time. Industry analysts suggest that this trend
plays a bigger part in driving upgrades to existing computer systems than
technological advancements.

19
Hardware Requirements:
The most common set of requirements defined by any operating
system or software application is the physical computer resources, also known as
hardware, a hardware requirements list is often accompanied by a hardware
compatibility list (HCL), especially in case of operating systems. An HCL lists
tested, compatible, and sometimes incompatible hardware devices for a particular
operating system or application. The following sub-sections discuss the various
aspects of hardware requirements.
Hardware Requirements for Present Project
1.Input Devices: Keyboard and Mouse RAM(512 MB)
2.Processor: P4 or above
3.Storage: Less than 100 GB of HDD space.
Software Requirements:
Software Requirements deal with defining software resource requirements
and pre-requisites that need to be installed on a computer to provide optimal
functioning of an application. These requirements or pre-requisites are generally
not included in the software installation package and need to be installed separately
before the software is installed.
Supported Operating Systems:
1.Windows xp, Windows 7,windows 8,windows 8.1
2. Linux all versions 
3. OSX Mountain Lion and above.
4. MATLAB_R2013a

20
4 MODULES
The project has 3 main modules which are used for image retrieval from
videos. The modules are as follows
4.1 GUI Module
4.2 Query module
I Querying Input Image
II Querying Input Video
the following have different modules again which are described shortly.
4.1 GUI MODULE
This is the main module in the point of User prospective. The input image
and the corresponding output images are displayed over here and all the interaction
performed by the user is monitored graphically in this module
4.2 QUERY MODULE
During the query module , User inputs a query image in the application,
now we need to change the size and shape of the query image to that of the
database images which are predefined earlier. Query module can be sub divided
into 2 modules
I . Querying Input Image
II. Querying Input Video
The images and videos can be processed as follows,
I. Querying Input Image
The input image should satisfy certain criteria in order to make the
matching and comparison valid. They are as follows;
Image Enhancement
Image enhancement is conversion of the original imagery to a better
understandable level in spectral quality for feature extraction or image
interpretation. It is useful to examine the image histograms before performing any
image enhancement. The x-axis of the histogram is the range of the available digital

21
numbers, i.e. 0 to 255 (in case of grey level). The y-axis is the number of pixels in
the image having a given digital number
Contrast stretching to increase the tonal distinction between various features in a
scene. The most common types of contrast stretching enhancement are: a linear
contrast stretch, a linear contrast stretch with saturation, a histogram-equalized
stretch.
Filtering is commonly used to restore an image by avoiding noises to enhance the
image for better interpretation and to extract features such as edges. The most
common types of filters used are: mean, median, low pass, high pass, edge
detection.
Image Transformation
Image transformations usually involve combined processing of data from
multiple spectral bands. Arithmetic operations (i.e. subtraction, addition,
multiplication, division) are performed to combine and transform the original bands
into "new" images which better display or highlight certain features in the scene.
Some of the most common transforms applied to image data are: image rationing:
this method involves the differencing of combinations of two or more bands aimed
at enhancing target features or principal components analysis (PCA). The objective
of this transformation is to reduce the dimensionality (i.e. the number of bands) in
the data, and compress as much of the information in the original bands into fewer
bands.
Image Classification
Information extraction is the last step toward the final output of the image
analysis. After pre-processing the data is subjected to quantitative analysis to assign
individual pixels to specific classes. Classification of the image is based on the
known and unknown identity to classify the remainder of the image consisting of
those pixels of unknown identity. After classification is complete, it is necessary to
evaluate its accuracy by comparing the categories on the classified images with the

22
areas of known identity on the ground. The final result of the analysis provides the
user with full information concerning the source data, the method of analysis and
the outcome and its reliability.
Applications of Image Processing:
Interest in digital image processing methods stems from two principal application
areas:
(1) Improvement of pictorial information for human interpretation, and
(2) Processing of scene data for autonomous machine perception.
In the second application area, interest focuses on procedures for extracting
information from an image in a form suitable for computer processing. Examples
include automatic character recognition, industrial machine vision for product
assembly and inspection, military recognizance, automatic processing of
fingerprints etc.
Basics of video processing:
Video is the technology of electronically capturing, recording, processing,
storing, transmitting, and reconstructing a sequence of still images representing
scenes in motion. Essentially the time component is considered in the case of
videos. It may be further described in the following manner:
1 Video is a sequence of still images representing scenes in motion.
2 Video is a motion technology of "electronically" done.
3 Video will be "capturing, recording, processing, storing, transmitting, and
reconstructing" all done electrically.
4.3 FEATURE EXTRACTION
Here in feature extraction we have used a efficient feature extraction
technique called as Histogram of Oriented Gradients. which retrieves the query
image’s color ,shape and texture features and compare them to that of the database
images.

23
4.4 IMAGE RETRIEVAL
During this module of the project, the features are getting compared and
sorted out. We have used a an efficient algorithm called euclidean distance which is
used to compare the features. The feature vectors are then sorted and indexing
mechanisms are used to retrieve the image from the databases and then they are
displayed accordingly on the GUI.

24
5. SELECTED SOFTWARE
The software used to perform different operations on different kinds of
images and their properties is a software called as MATLAB. The overview of
Matlab and its operations over different images and their features are illustrated
with examples in the below description as follows
5.1 MATLAB
The name ‘Matlab’ comes from two words: matrix and laboratory.
According to The MathWorks (producer of Matlab), Matlab is a technical
computing language used mostly for high-performance numeric calculations and
visualization. It integrates computing, programming, signal processing and graphics
in easy to use environment, in which problems and solutions can be expressed with
mathematical notation. Basic data element is an array, which allows for computing
difficult mathematical formulas, which can be found mostly in linear algebra. But
Matlab is not only about math problems. It can be widely used to analyze data,
modeling, simulation and statistics. Matlab high-level programming language finds
implementation in other fields of science like biology, chemistry, economics,
medicine and many more. In the following paragraph which is fully based on the
MathWorks, ‘Getting started with Matlab’, I introduce the main features of the
Matlab.Most important feature of Matlab is easy extensibility. This environment
allows creating new applications and becoming contributing author. It has evolved
over many years and became a tool for research, development and analysis. Matlab
also features set of specific libraries, called toolboxes. They are collecting ready to
use functions, used to solve particular areas of problems. Matlab System consist
five main parts. First, Desktop Tools and Development Environment are set of tools
helpful while working with functions and files. Examples of this part can be

25
command window, the workspace, notepad editor and very extensive help
mechanism. Second part is The Matlab Mathematical Function Library. This is a
wide collection of elementary functions like sum, multiplication, sine, cosine,
tangent, etc. Besides simple operations, more complex arithmetic can be calculated,
including matrix inverses, Fourier transformations and approximation functions.
Third part is the Matlab language, which is high-level array language with
functions, data structures and object-oriented programming features. It allows
programming small applications as well as large and complex programs. Fourth
piece of Matlab System is its graphics. It has wide tools for displaying graphs and
functions. It contains two and three-dimensional visualization, image processing,
building graphic user interface and even animation. Fifth and last part is Matlab’s
External Interfaces. This library gives a green light for writing C and Fortran
programs, which can be read and connected with Matlab.
Data representation :
Data representation in Matlab is the feature that distinguishes this
environment from others. Everything is presented with matrixes. The definition of
matrix by MathWorks is a rectangular array of numbers. Matlab recognizes binary
and text files. There is couple of file extensions that are commonly used, for
example *.m stands for M-file. There are two kinds of it: script and function M-file.
Script file contains sequence of mathematical expressions and commands. Function
type file starts with word function and includes functions created by the user.
Different example of extension is *.mat. Files *.mat are binary and include work
saved with command File/Save or Save as. Since Matlab stores all data in matrixes,
program offers many ways to create them.
The easiest one is just to type values. There are three general rules:
• the elements of a row should be separated with spaces or commas;

26
• to mark the end of each row a semicolon ‘;’ should be used;
• square brackets must surround whole list of elements.
After entering the values matrix is automatically stored in the workspace
(MathWorks, 2002, chapter 3.3). To take out specific row, round brackets are
required. In the 3x3 matrix, pointing out second row would be (2,:) and third
column (:,3). In order to recall one precise element bracket need to contain two
values. For example (2,3) stands for third element in the second row. Variables are
declared as in every other programming language. Also arithmetic operators are
presented in the same way – certain value is assigned to variable. When the result
variable is not defined, Matlab creates one, named Ans, placed in the workspace.
Ans stores the result of last operation. One command worth mentioning is plot
command. It is responsible for drawing two dimensional graphs. Although this
command belongs to the group liable for graphics, it is command from basic
Matlab instructions, not from Image Processing toolbox. It is not suitable for
processing images, therefore it will not be described. Last paragraph considers
matrixes as two-dimensional structures. For better understanding how Matlab stores
images, three dimensional matrixes have to be explained. In three dimensional
matrixes there are three values in the brackets. First value stands for number of
row, second value means column and third one is the extra dimension. Similarly,
fourth number would go as fourth dimension, etc. The best way to understand it, is
to look at Figure 3, which presents the method of pointing each element in this
three dimensional matrix. Figure 3. The example of three dimensional matrix, built
in Matlab (Ozimek, lectures from Digital image processing, 2010a) As mentioned
before, Matlab stores images in arrays, which naturally suit to the representation of
images. Most pictures are kept in two-dimensional matrices. Each element
corresponds to one pixel in the image. For example image of 600 pixels height and

27
800 pixels width would be stored in Matlab as a matrix in size 600 rows and 800
columns. More complicated images are stored in three-dimensional matrices.
Truecolor pictures require the third dimension, to keep their information about
intensities of RGB colors. They vary between 0 and 1 value (MathWorks, 2009,
2.12). The most convenient way of pointing locations in the image, is pixel
coordinate system. To refer to one specific pixel, Matlab requires number of row
and column that stand for sought point. Values of coordinates range between one
and the length of the row or column. Images can also be expressed in spatial system
coordinates. In that case positions of pixel are described as x and y. By default,
spatial coordinates correspond with pixel coordinates. For example pixel (2,3)
would be translated to x=3 and y=2. The order of coordinates is reversed
(Koprowski & Wróbel, 2008, 20-21).
Endless possibilities :
As mentioned earlier, Matlab offers very wide selection of toolboxes.
Most of them are created by Mathworks but some are made by advanced users.
There is a long list of possibilities that this program gives. Starting from
automation, through electrical engineering, mechanics, robotics, measurements,
modeling and simulation, medicine, music and all kinds of calculations. Next
couple of paragraphs will shortly present some toolboxes available in Matlab. The
descriptions are based on the theory from Mrozek&Mrozek (2001, 387 – 395)
about toolboxes and mathworks.com. Very important group of toolboxes are
handling with digital signal processing. Communication Toolbox provides
mechanisms for modeling, simulation, designing and analysis of functions for the
physical layer of communication systems. This toolbox includes algorithms that
help with coding channels, modulation, demodulation and multiplexing of digital

28
signals. Communication toolbox also contains graphical user interface and plot
function for better understanding the signal processing. Similarly, Signal
Processing Toolbox, deals with signals. Possibilities of this Matlab library are
speech and audio processing, wireless and wired communications and analog filter
designing. Another group is math and optimization toolboxes. Two most common
are Optimization and Symbolic Math toolboxes. The first one handles large-scale
optimization problems. It contains functions responsible for performing nonlinear
equations and methods for solving quadratic and linear problems. More used library
is the second one. Symbolic Math toolbox contains hundreds of functions ready to
use when it comes to differentiation, integration, simplification, transforms and
solving of equations. It helps with all algebra and calculus calculations. Small
group of Matlab toolbox handles statistics and data analysis. Statistics toolbox
features are data management and organization, statistical drawing, probability
computing and visualization. It also allows designing experiments connected with
statistic data. Financial Toolbox is an extension to previously mentioned library.
Like the name states, this addition to Matlab handles finances. It is widely used to
estimate economical risk, analyze interest rate and creating financial charts. It can
also work with evaluation and interpretation of stock exchange actions. Neural
Networks Toolbox can be considered as one of the data analyzing library. It has set
of functions that create, visualize and simulate neural networks. It is helpful when
data change nonlinearly. Moreover, it provides graphical user interface equipped
with trainings and examples for better understanding the way neural network
works. Some toolboxes do not belong to any specific group but they are worth
mentioning. For example Fuzzy Logic Toolbox offers wide range of functions
responsible for fuzzy calculations. It allows user to look through the results of

29
fuzzy computations. Matlab provides also very useful connection to databases
through Database Toolbox.
It allows analyzing and processing the information stored in the tables. It supports
SQL (Structured Query Language) commands to read and write data, and to create
simple queries to search through the information. This specific toolbox interacts
with Oracle and other database processing programs. And what is most important,
Database Toolbox allows beginner users, not familiar with SQL, to access and
query databases. Last but not least, very important set of libraries – image
processing toolboxes. Mapping Toolbox is one of them, which is responsible for
analyzing geographic data and creating maps. It provides compatibility for raster
and vector graphics which can be imported. Additionally, as well two-dimensional
and three-dimensional maps can be displayed and customized. It also helps with
navigation problems and digital terrain analysis. Image Acquisition Toolbox is a
very valuable collection of functions that handles receiving image and video signal
directly from computer to the Matlab environment. This toolbox recognizes video
cameras from multiple hardware vendors. Specially designed interface leads
through possible transformations of images and videos, acquired thanks to
mechanisms of Image Acquisition Toolbox. Image Processing Toolbox is a wide
set of functions and algorithms that deal with graphics. It supports almost any type
of image file. It gives the user unlimited options for pre- and post- processing of
pictures. There are functions responsible for image enhancement, deblurring,
filtering, noise reduction, spatial transformations, creating histograms, changing the
threshold, hue and saturation, also for adjustment of color balance, contrast,
detection of objects and analysis of shapes.

30
5.2 Features
5.2.1 Colour
Definition
One of the most important features that make possible the recognition of
images by humans is colour. Colour is a property that depends on the reflection of
light to the eye and the processing of that information in the brain. We use colour
everyday to tell the difference between objects, places, and the time of day. Usually
colours are defined in three dimensional colour spaces. These could either be RGB
(Red, Green, and Blue),
HSV (Hue, Saturation, and Value) or HSB (Hue, Saturation, and Brightness). The
last two are dependent on the human perception of hue, saturation, and brightness.
Most image formats such as JPEG, BMP, GIF, use the RGB colour space to store
information. The RGB colour space is defined as a unit cube with red, green, and
blue axes. Thus, a vector with three co-ordinates represents the colour in this space.
When all three coordinates are set to zero the colour perceived is black. When all
three coordinates are set to 1 the colour perceived is white. The other colour spaces
operate in a similar fashion but with a different perception.
Methods of Representation
The main method of representing colour information of images in any
system is through colour histograms. A colour histogram is a type of bar graph,
where each bar represents a particular colour of the colour space being used. In
MatLab for example you can get a colour histogram of an image in the RGB or
HSV colour space. The bars in a colour histogram are referred to as bins and they
represent the x-axis. The number of bins depends on the number of colours there
are in an image. The y-axis denotes the number of pixels there are in each bin. In
other words how many pixels in an image are of a particular colour.

31
An example of a colour histogram in the HSV colour space can be seen
with the following image:
FIG 5: AN IMAGE AND ITS HISTOGRAM
To view a histogram numerically one has to look at the colour map or the numeric
representation of each bin. As one can see from the colour map each row represents
the colour of a bin. The row is composed of the three coordinates of the colour
space. The first coordinate represents hue, the second saturation, and the third,
value, thereby giving HSV. The percentages of each of these coordinates are what
make up the colour of a bin. Also one can see the corresponding pixel numbers for
each bin, which are denoted by the blue lines in the histogram.
Quantization in terms of colour histograms refers to the process of reducing the
number of bins by taking colours that are very similar to each other and putting
them in the same bin. By default the maximum number of bins one can obtain using
the histogram function in MatLab is 256. For the purpose of saving time when
trying to compare colour histograms, one can quantize the number of bins.
Obviously quantization reduces the information regarding the images but as was
mentioned this is the tradeoff when one wants to reduce processing time.There are
two types of colour histograms, Global colour histograms (GCHs) and Local colour
histograms (LCHs). A GCH represents one whole image with a single colour
histogram. An LCH divides an image into fixed blocks and takes the colour
histogram of each of those blocks. LCHs contain more information about an image
but are computationally expensive when comparing images. “The GCH is the
traditional method for colour based image retrieval. However, it does not include

32
information concerning the colour distribution of the regions” of an image. Thus
when comparing GCHs one might not always get a proper result in terms of
similarity of images.
5.2.2Texture
Definition
Texture is that innate property of all surfaces that describes visual patterns,
each having properties of homogeneity. It contains important information about the
structural arrangement of the surface, such as; clouds, leaves, bricks, fabric, etc. It
also describes the relationship of the surface to the surrounding environment. In
short, it is a feature that describes the distinctive physical composition of a surface.
Texture properties include:
1) Coarseness
2) Contrast
3) Directionality
4) Line-likeness
5) Regularity
FIGURE 6: TEXTURE PROPERTIES
Texture is one of the most important defining features of an image. It is
characterised by the spatial distribution of gray levels in a neighbourhood. In order
to capture the spatial dependence of gray-level values, which contribute to the
perception of texture, a two-dimensional dependence texture analysis matrix is
taken into consideration. This two-
dimensional matrix is obtained by decoding the image file; jpeg, bmp, etc.
There are three principal approaches used to describe texture; statistical, structural

33
and spectral…
•
Statistical techniques characterise textures using the statistical properties of the
grey levels of the points/pixels comprising a surface image. Typically, these
properties are computed using: the grey level co-occurrence matrix of the
surface, or the wavelet transformation of the surface.
•
Structural techniques characterise textures as being composed of simple
primitive structures called “texels” (or texture elements in Figure 6). These are
arranged regularly on a surface according to some surface arrangement rules.
•
Spectral techniques are based on properties of the Fourier spectrum and
describe global periodicity of the grey levels of a surface by identifying high-
energy peaks in the Fourier spectrum
For optimum classification purposes, what concern us are the statistical techniques
of characterisation. This is because it is these techniques that result in computing
texture properties.
The most popular statistical representations of texture are:
•
Co-occurrence Matrix
•
Tamura Texture
•
Wavelet Transform
Co-occurrence Matrix
Originally proposed by R.M. Haralick, the co-occurrence matrix
representation of texture features explores the grey level spatial dependence of
texture. A mathematical definition of the co-occurrence matrix is as follows:
- Given a position operator P(i,j),
- let A be an n x n matrix
- whose element A[i][j] is the number of times that points with grey level (intensity)
g[i] occur, in the position specified by P, relative to points with grey level g[j].
- Let C be the n x n matrix that is produced by dividing A with the total number
of point pairs that satisfy P. C[i][j] is a measure of the joint probability that a pair
of points satisfying P will have values g[i], g[j].
- C is called a co-occurrence matrix defined by P.

34
FIGURE 7: CLASSICAL CO-OCCURENCE MATRIX
At first the co-occurrence matrix is constructed, based on the orientation and
distance between image pixels as shown in Figure 7. Then meaningful statistics are
extracted from the matrix as the texture representation.
Haralick proposed the following texture features:
1. Angular Second Moment
2. Contrast
3. Correlation
4. Variance
5. Inverse Second Differential Moment
6. Sum Average
7. Sum Variance
8. Sum Entropy
9. Entropy
10. Difference Variance
11.Difference Entropy
12.Measure of Correlation 1
13.Measure of Correlation 2
14.Local Mean
Hence, for each Haralick texture feature, we obtain a co-occurrence matrix. These

35
co-occurrence matrices represent the spatial distribution and the dependence of the
grey levels within a local area. Each (i,j) th
entry in the matrices, represents the
probability of going from one pixel with a grey level of 'i' to another with a grey
level of 'j' under a predefined distance and angle. From these matrices, sets of
statistical measures are computed, called feature vectors.
Tamura Texture
By observing psychological studies in the human visual perception, Tamura
explored the texture representation using computational approximations to the three
main texture features of: coarseness, contrast, and directionality. Each of these
texture features are approximately computed using algorithms…
•
Coarseness is the measure of granularity of an image, or average size of regions
that have the same intensity.
•
Contrast is the measure of vividness of the texture pattern. Therefore, the bigger the
blocks that make up the image, the higher the contrast. It is affected by the use of
varying black and white intensities.
•
Directionality is the measure of directions of the grey values within the image.
Wavelet Transform
Textures can be modeled as quasi-periodic patterns with spatial/frequency
representation. The wavelet transform transforms the image into a multi-scale
representation with both spatial and frequency characteristics. This allows for
effective multi-scale image analysis with lower computational cost. According to
this transformation, a function, which can represent an image, a curve, signal etc.,
can be described in terms of a coarse level description in addition to others with
details that range from broad to narrow scales. Unlike the usage of sine functions to
represent signals in Fourier transforms, in wavelet transform, we use functions
known as wavelets. Wavelets are finite in time, yet the average value of a wavelet

36
is zero. In a sense, a wavelet is a waveform that is bounded in both frequency and
duration. While the Fourier transform converts a signal into a continuous series of
sine waves, each of which is of constant frequency and amplitude and of infinite
duration, most real-world signals (such as music or images) have a finite duration
and abrupt changes in frequency. This accounts for the efficiency of wavelet
transforms. This is because wavelet transforms convert a signal into a series of
wavelets, which can be stored more efficiently due to finite time, and can be
constructed with rough edges, thereby better approximating real-world signals.
Examples of wavelets are Coiflet, Morlet, Mexican Hat, Haar and Daubechies. Of
these, Haar is the simplest and most widely used, while Daubechies have fractal
structures and are vital for current wavelet applications. These two are outlined
below:
5.2.3 Shape
Definition
Shape may be defined as the characteristic surface configuration of
an object; an outline or contour. It permits an object to be distinguished from its
surroundings by its outline [Figure 8]. Shape representations can be generally
divided into two categories:
•
Boundary-based, and
•
Region-based.
FIGURE 8: BOUNDARY-BASED & REGION-BASED
Boundary-based shape representation only uses the outer boundary of the shape.

37
This is done by describing the considered region using its external characteristics;
i.e., the pixels along the object boundary. Region-based shape representation uses
the entire shape region by describing the considered region using its internal
characteristics; i.e., the pixels contained in that region.
For representing shape features mathematically, we have:
Boundary-based:
1) Polygonal Models, boundary partitioning
2) Fourier Descriptors
3) Splines, higher order constructs
4) Curvature Models
Region-based:
1) Superquadrics
2) Fourier Descriptors
3) Implicit Polynomials
4) Blum's skeletons
The most successful representations for shape categories are Fourier Descriptor and
Moment Invariants:
1) The main idea of Fourier Descriptor is to use the Fourier transformed boundary
as the shape feature.
2) The main idea of Moment invariants is to use region-based moments, which are
invariant to transformations as the shape feature.

38
6. SYSTEM DESGIN
6.1 Introduction
System design is the process of defining the component, modules, interfaces and
data for a system to satisfy specified requirements. This is usually done in 2 in
ways
I. UML DIAGRAMS II. DATAFLOW DIAGRAMS
6.2 Data Flow Diagram
The Data Flow Diagram (DFD) is a graphical representation of the flow of data
through an information system. It enables you to represent the processes in your
information system from the viewpoint of data
FIGURE 9: DATA FLOW DIAGRAM OF VIDEO SHOT BOUNDARY DETECTION

39
7.IMPLEMENTATION
CREATING A GUI:-
A Graphical User-Interface(GUI) is an essential part where a user can easily navigate
through the application and its where about. It has all the essential parts which responds
as per the request made by the user. Each element arranged in the GUI has an axis in a
coordinate system and all these elements are given action specific code in a file that
corresponds to matlab file(.M file). For creating an graphical user interface we use
matlab’s graphical user interface tool box for creation and element positioning.
We start with ‘guide’ command in the command window, which opens the tool box for
matlab GUI creation. Here, we specify the name of the GUI we are going to create and
matlab opens us a GUI building module. Firstly, the title of the project 'VIDEO SHOT
BOUNDARY DETECTION' is placed at the centre as shown in the below figure.
This is done using the static tool present in the toolbox. By double clicking the text
field generated we can edit the features of the text field like changing the color and text
fields. Similarly, it was done for the input image text field also in the GUI.

40
Assigning the elements their corresponding axes for making an active
functionality while coding is done by this phase of the creation of axes, which helps in
displaying the images corresponding to a event happening dynamically. Axes is used
create an axes graphics object in the current figure using default property values. It is
done with the help of axes tools present in the toolbox, each axes is named accordingly
so that an event can be handled over it.
We need to create an interactive interface like buttons, so that the client or the user
interacts with the application easily. We have created two buttons for video selection
and image selection. We do this by placing the buttons from the tool box. However an
event should occur when the user clicks the button, this is done with the help of
function calling or call back. This creates a .M file , where the code is getting handled.
Creating a BasicGui.M File
In this step the actual logic is present. The algorithm we have used here is 'Euclidean
distance', which we use to calculate the distance between the query image and the
frames of the video in the comparison space.
Euclidean distance:
In order to calculate the distance between two images to find the similarity and
dissimilarity, the algorithm used is euclidean distance algorithm. In euclidean distance
metric difference of feature of query and database image is squared which increased the
divergence between the query and database image.
Logic of Feature extraction and comparison
For extracting the Features and comparing them with the datasets according to the
following steps as shown,
Step 1: Reading the image input from the GUI
Step 2: After reading the input image into a variable, display on the axes using axes
function

41
Step 3: Run the vl-feat library in the matlab by tracing the current directory path.
Step 4: Initialise the cell Size = 8 with 50 Percent overlapping factor.
Step 5: Since the input image and database image of the frame from the video should be
of same size we need toresize the image into 256 X 256 size
Step 6 :We need to calculate the gradients of the input image. This is done with the help
of Histogram of Oriented Gradients. This returns a matrix containing all the gradient
values and magnitude values, which are the feature descriptor values
Step 7: Reshape the matrix formed in Step 6 into single dimensional array for
comparison
Step 8: Similarly for all the frames in the database from 1 to the the count of total
number of files repeat steps 4 to Step 8
Step 9: Then store the single dimension values of all images into a single dimension
array. Which means each element in the cell array which is single dimensional has
single dimensional values of gradient values.
Step10:Using Euclidean method as stated above, calculate the distance between
database images and input image. In order to make sure that all the distance values are
in a single dimensional array
Step 11:After finding the distances corresponding to each frame in the video and the
current query image, the indexes in the directory and display the images over the
axes.
Step 12:Then from the obtained Euclidean values of all the frames in the database, the
best values of the images are chosen and displayed in the axes. Thus, the execution of
the project is terminated.

42
7.1 SAMPLE CODE:
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton1 (see GCBO)
% eventdata reserved - to be defined in a future version
of MATLAB
% handles structure with handles and user data (see
GUIDATA)
[filename
pathname]=uigetfile({'*.bmp';'*.mp4';'*.avi';},'File
Selector');
str=strcat(pathname,filename);
obj = mmreader(str);
vid = read(obj);
frames = obj.NumberOfFrames;
for x = 1 : frames
imwrite(vid(:,:,:,x),strcat('frame-
',num2str(x),'.jpeg'));
end
%cd('C:UsersanveshDesktopProject');
a=imread('frame-1.jpeg');
axes(handles.axes4);
imshow(a);
b=imread('frame-2.jpeg');
imshow(b);
c=imread('frame-3.jpeg');
imshow(c);
d=imread('frame-4.jpeg');
imshow(d);
e=imread('frame-5.jpeg');
imshow(e);
f=imread('frame-6.jpeg');
imshow(f);
g=imread('frame-7.jpeg');
imshow(g);
h=imread('frame-8.jpeg');
imshow(h);
i=imread('frame-9.jpeg');

43
imshow(i);
% --- Executes on button press in pushbutton2.
function pushbutton2_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton2 (see GCBO)
% eventdata reserved - to be defined in a fut ure version
of MATLAB
% handles structure with handles and user da ta (see
GUIDATA)
% function for image selection der or not
run('C:UsersanveshDesktop1Projectvlfeat-
0.9.19toolboxvl_setup');
[filename
pathname]=uigetfile({'*.jpeg';'*.png';'*.avi';},'File
Selector');
str=strcat(pathname,filename);
img1=imread(str);
imagefiles=dir('*.jpeg');
img1resize=imresize(img1,[256 256]);
cellsize=8
hog1=vl_hog(single(img1resize),cellsize,'verbose');
n1=numel(hog1);
rehog1=reshape(hog1,[1,n1]);
nfiles=length(imagefiles);
for i=1:1:nfiles
currentfile=imagefiles(i).name;
currentimage=imread(currentfile);
image{i}=imresize(currentimage,[256 256]);
hogca{i}=vl_hog(single(image{i}),cellsize,'verbose');
n=numel(hogca{i});
rehogca{i}=reshape(hogca{i},[1,n]);
dist(i)=pdist2(rehog1,rehogca{i},'euclidean');
end
[sorted,ix]=sort(dist);
firstIndex=ix(1:10);
str1=imagefiles(firstIndex(1)).name;
image1=imread(str1);
image2=imread(str);
imgresize1=imresize(image1,[256 256]);
imgresize2=imresize(image2,[256 256]);
x='found image';
if(imgresize1==imgresize2)
str1=imagefiles(firstIndex(1)).name
disp(x);
else
disp('not found');
end
%Running vl-feat toolbox for handling videos %initially by
breaking down
%into shots and frames

44
for i=1:1:20
run('vlfeat-0.9.19/toolbox/vl_setup');
cellsize=8;
str='frame-';
str2=strcat(str,num2str(i));
str2=strcat(str2,'.jpeg');
img{i}=imread(str2);
imgresize=imresize(img{i},[256 256]);
hog1{i}=vl_hog(single(imgresize),cellsize,'verbose');
n1=numel(hog1{i});
rehog{i}=reshape(hog1{i},[1,n1]);
% input done
imagefiles=dir('*.jpeg');
nfiles=length(imagefiles);
for j=1:1:nfiles
currentimage=imread(imagefiles(j).name);
ir{j}=imresize(currentimage,[256 256]);
hc{j}=vl_hog(single(ir{j}),cellsize,'verbose');
n=numel(hc{j});
rh{j}=reshape(hc{j},[1,n]);
dist{j}=pdist2(rehog{i},rh{j},'euclidean')
end
end

45
7.2 SAMPLE_GUI.M & SIMULATION RESULTS
PROVIDING THE QUERY IMAGE:

46
DISPLAYING THE FRAMES CORRESPONDING TO THE VIDEO.

47
OUTPUT OF THE EXECUTION(features matched):-

48
OUTPUT OF THE EXECUTION (features unmatched):-

49
8.SYSTEM TESTING
8.1 TESTING:
 Testing is a process of executing a program with a intent of finding
an error.
 Testing presents an interesting anomaly for the software engineering.
 Testing is a set of activities that can be planned in advance and
conducted systematically
 Testing is a set of activities that can be planned in advance and
conducted systematically
 Software testing is often referred to as verification & validation.
8.2 TYPES OF TESTING:
The various types of testing are
1)White Box Testing
2)Black Box Testing
3)Alpha Testing
4)Beta Testing
5)Win Runner
6)Load Runner
7)System Testing
8)Unit Testing
9)End to End Testing
The type of testing we have used to measure the accuracy and efficiency of the
retrieval is Black box testing. It is used to check the output depending upon the
input given
WHITE-BOX TESTING
White-box testing, sometimes called glass-box testing, is a test case design
method that uses the control structure of the procedural design to derive test
cases. Using
white-box testing methods, the software engineer can derive test cases that

50
(1) guarantee that all independent paths within a module have been exercised at
least once,
(2) exercise all logical decisions on their true and false sides, (3) execute all loops
at their boundaries and within their operational bounds, and (4) exercise internal
datastructures to ensure their validity.
A reasonable question might be posed at this juncture: "Why spend time and
energy worrying about (and testing) logical minutiae when we might better
expend effort It is not possible to exhaustively test every program path because
the number of paths is simply too large White-box tests can be designed only
after a component-level design (or source code) exists. The logical details of the
program must be available ensuring that program requirements have been met?"
Stated another way, why don't we spend all of our energy on black-box tests?
The answer lies in the nature of software defects
• Logic errors and incorrect assumptions are inversely proportional to the
probability
that a program path will be executed. Errors tend to creep into our work when
we design and implement function, conditions, or control that are out of the
mainstream. Everyday processing tends to be well understood (and well
scrutinized), while "special case" processing tends to fall into the cracks.
• We often believe that a logical path is not likely to be executed when, in fact, it
may be executed on a regular basis. The logical flow of a program is sometimes
counterintuitive, meaning that our unconscious assumptions about flow of
control and data may lead us to make design errors that are uncovered only once
path testing commences.
• Typographical errors are random. When a program is translated into
programming language source code, it is likely that some typing errors will
occur. Many will be uncovered by syntax and type checking mechanisms, but
others may go undetected until testing begins. It is as likely that a typo will exist
on an obscure logical path as on a mainstream path.
BlackBox Testing:
1) Its also called as behavioural testing . It focuses on the functional
requirements of the software.

51
2) It is complementary approach that is likely to uncover a different class of
errors than white box errors.
3) A black box testing enables a software engineering to derive a sets of
input conditions that will fully exercise all functional requirements for a
program can be applied to virtually every level of software testing.
Accuracy and precision are defined in terms of systematic and random errors.
The more common definition associates accuracy with systematic errors and
precision with random errors.
Accuracy is measured by formula
Precision is measure by the formula
ALPHA TESTING:-
The alpha test is conducted at the developer's site by a customer. The software
is used in a natural setting with the developer "looking over the shoulder" of the
user and recording errors and usage problems. Alpha tests are conducted in a
controlled environment.
BETA TESTING:-
The beta test is conducted at one or more customer sites by the end-user of the
software. Unlike alpha testing, the developer is generally not present. Therefore,
the beta test is a "live" application of the software in an environment that cannot
be controlled by the developer. The customer records all problems (real or
imagined) that
are encountered during beta testing and reports these to the developer at regular
intervals. As a result of problems reported during beta tests, software engineers

52
make modifications and then prepare for release of the software product to the
entire customer base.
SYSTEM TESTING:-
System testing is actually a series of different tests whose primary purpose is to
fully Exercise the computer-based system. Although each test has a different
purpose, all Work to verify that system elements have been properly integrated
and perform allocated functions. In the sections that follow, we discuss the
types of system tests that are worthwhile for software-based systems.
UNIT TESTING:-
Unit testing focuses verification effort on the smallest unit of software design—
the software component or module. Using the component-level design
description as a guide, important control paths are tested to uncover errors
within the boundary of the module. The relative complexity of tests and
uncovered errors is limited by the constrained scope established for unit testing.
The unit test is white-box oriented, and the step can be conducted in parallel for
multiple components.
8.3 TestCases / Sample Cases
1) Input Image - epic.jpeg
output from the module - frame-1.jpeg match found
Precision = 1/1+0 * 100 =100 %
2) Input Image - epic1.jpeg
output from the module - frame -57.jpeg match found
Precision = 1/1+0 * 100 =100 %
3) Input image - 802.png
output from the module - match not found
Precision = 0/0+1 * 100 =0%
4) Input image - 1710.png
output from the module - frame 131.jpeg match found
Precision = 1/1+0 * 100 =100 %
accuracy = 3 + 1 / 3+0+0+1 = 4/4 =100%

53
9. CONCLUSION
In this project, the an efficient method for VIDEO SHOT BOUNDARY
DETECTION implementation in any video format is done basing on Histogram of
oriented gradients. It has showed that our schema has the most efficiency when
compared to the existing feature extractors. This project has clearly demonstrated the
retrieval of the necessary information from different formats video content is done
efficiently. Therefore the solution can be treated as a new candidate for Retrieval
system.
As part of future work, extend of work to explore and design more effective
retrieval we can use mutation techniques. However First, this project found that the
performance and efficiency of SBD, especially for all kinds of image and video
formats and under a huge dataset conditions.

54
10. REFERENCES
 A survey based on Video Shot Boundary Detection techniques by Nikita Sao1 , Ravi
Mishra2 ME Scholar, Department of ET&T, SSCET, Bhilai, India.International
Journal of Advanced Research in Computer and Communication Engineering Vol. 3,
Issue 4, April 2014
 A Survey on Visual Content-Based Video Indexing and Retrieval Weiming Hu,
Senior Member, IEEE, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen
MaybankIEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—
PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 6, NOVEMBER 2011
 Shot Boundary Detection Using Shifting of Image Frame Rahul Kumar Garg1 ,
Gaurav Saxena2 International Journal of Scientific Engineering and Technology
(ISSN : 2277-1581) Volume No.3 Issue No.6, pp : 785-788 June-2014
 International Journal of Scientific Engineering and Technology (ISSN : 2277-1581)
Volume No.3 Issue No.6, pp : 785-788
 Video Shot Detection Techniques Brief Overview Mohini Deokar, Ruhi Kabra
International Journal of Engineering Research and General Science Volume 2, Issue
6, October-November, 2014 ISSN 2091-2730
 http://en.wikipedia.org/wiki/Shot_transition_detection

55
APPENDIX-I
In order to retrieve the histogram of oriented gradient features we need
a feature extractor code to exhibit the functionality. This is done with the help of C
Language, where we use MEX libraries to define the functionality in the C code and
use the functions in Matlab
HOG.c
/** @file hog.c
** @brief Histogram of Oriented Gradients (HOG) -
Definition
**/
/*
Copyright (C) 2014 Anvesh.
All rights reserved.
This file is part of the VLFeat library and is made
available under
the terms of the BSD license (see the COPYING file).
*/
#include "hog.h"
#include "mathop.h"
#include <string.h>
/**

@page hog Histogram of Oriented Gradients (HOG)
features
@author Anvesh

@ref hog.h implements the Histogram of Oriented
Gradients (HOG) features
in the variants of Dalal Triggs
@cite{dalal05histograms} and of UOCTTI
@cite{felzenszwalb09object}. Applications include
object detection
and deformable object detection.
- @ref hog-overview
- @ref hog-tech

@section hog-overview Overview
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

56
~~~~~~~ -->
HOG is a standard image feature used, among others,
in object detection
and deformable object detection. It decomposes the
image into square cells
of a given size (typically eight pixels), compute a
histogram of oriented
gradient in each cell (similar to @ref sift), and
then renormalizes
the cells by looking into adjacent blocks.
VLFeat implements two HOG variants: the original one
of Dalal-Triggs
@cite{dalal05histograms} and the one proposed in
Felzenszwalb et al.
@cite{felzenszwalb09object}.
In order to use HOG, start by creating a new HOG
object, set the desired
parameters, pass a (color or grayscale) image, and
read off the results.
@code
VlHog * hog = vl_hog_new(VlHogVariantDalalTriggs,
numOrientations, VL_FALSE) ;
vl_hog_put_image(hog, image, height, width,
numChannels, cellSize) ;
hogWidth = vl_hog_get_width(hog) ;
hogHeight = vl_hog_get_height(hog) ;
hogDimenison = vl_hog_get_dimension(hog) ;
hogArray =
vl_malloc(hogWidth*hogHeight*hogDimension*sizeof(floa
t)) ;
vl_hog_extract(hog, hogArray) ;
vl_hog_delete(hog) ;
@endcode
HOG is a feature array of the dimension returned
by ::vl_hog_get_width,
::vl_hog_get_height, with each feature (histogram)
having
dimension ::vl_hog_get_dimension. The array is stored
in row major order,
with the slowest varying dimension beying the
dimension indexing the histogram
elements.
The number of entreis in the histogram as well as
their meaning depends
on the HOG variant and is detailed later. However, it
is usually

57
unnecessary to know such details. @ref hog.h provides
support for
creating an inconic representation of a HOG feature
array:
@code
glyphSize = vl_hog_get_glyph_size(hog) ;
imageHeight = glyphSize * hogArrayHeight ;
imageWidth = glyphSize * hogArrayWidth ;
image =
vl_malloc(sizeof(float)*imageWidth*imageHeight) ;
vl_hog_render(hog, image, hogArray) ;
@endcode
It is often convenient to mirror HOG features from
left to right. This
can be obtained by mirroring an array of HOG cells,
but the content
of each cell must also be rearranged. This can be
done by
the permutation obtaiend by ::vl_hog_get_permutation.
Furthermore, @ref hog.h suppots computing HOG
features not from
images but from vector fields.

@section hog-tech Technical details

HOG divdes the input image into square cells of size
@c cellSize,
fitting as many cells as possible, filling the image
domain from
the upper-left corner down to the right one. For each
row and column,
the last cell is at least half contained in the
image.
More precisely, the number of cells obtained in this
manner is:
@code
hogWidth = (width + cellSize/2) / cellSize ;
hogHeight = (height + cellSize/2) / cellSize ;
@endcode
Then the image gradient @f$ nabla ell(x,y) @f$
is computed by using central difference (for colour
image

58
the channel with the largest gradient at that pixel
is used).
The gradient @f$ nabla ell(x,y) @f$ is assigned to
one of @c 2*numOrientations orientation in the
range @f$ [0,2pi) @f$ (see @ref hog-conventions for
details).
Contributions are then accumulated by using bilinear
interpolation
to four neigbhour cells, as in @ref sift.
This results in an histogram @f$h_d@f$ of dimension
2*numOrientations, called of @e directed orientations
since it accounts for the direction as well as the
orientation
of the gradient. A second histogram @f$h_u@f$ of
undirected orientations
of half the size is obtained by folding @f$ h_d @f$
into two.
Let a block of cell be a @f$ 2times 2 @f$ sub-array
of cells.
Let the norm of a block be the @f$ l^2 @f$ norm of
the stacking of the
respective unoriented histogram. Given a HOG cell,
four normalisation
factors are then obtained as the inverse of the norm
of the four
blocks that contain the cell.
For the Dalal-Triggs variant, each histogram @f$ h_d
@f$ is copied
four times, normalised using the four different
normalisation factors,
the four vectors are stacked, saturated at 0.2, and
finally stored as the descriptor
of the cell. This results in a @c numOrientations * 4
dimensional
cell descriptor. Blocks are visited from left to
right and top to bottom
when forming the final descriptor.
For the UOCCTI descriptor, the same is done for both
the undirected
as well as the directed orientation histograms. This
would yield
a dimension of @c 4*(2+1)*numOrientations elements,
but the resulting
vector is projected down to @c (2+1)*numOrientations
elements
by averaging corresponding histogram dimensions. This

59
was shown to
be an algebraic approximation of PCA for descriptors
computed on natural
images.
In addition, for the UOCTTI variant the l1 norm of
each of the
four l2 normalised undirected histograms is computed
and stored
as additional four dimensions, for a total of
@c 4+3*numOrientations dimensions.

@subsection hog-conventions Conventions

The orientation of a gradient is expressed as the
angle it forms with the
horizontal axis of the image. Angles are measured
clock-wise (as the vertical
image axis points downards), and the null angle
corresponds to
an horizontal vector pointing right. The quantized
directed
orientations are @f$ mathrm{k} pi /
mathrm{numOrientations} @f$, where
@c k is an index that varies in the ingeger
range @f$ {0, dots, 2mathrm{numOrientations} - 1}
@f$.
Note that the orientations capture the orientation of
the gradeint;
image edges would be oriented at 90 degrees from
these.
**/
/*
-----------------------------------------------------
----------- */
/** @brief Create a new HOG object
** @param variant HOG descriptor variant.
** @param numOrientations number of distinguished
orientations.
** @param transposed wether images are transposed
(column major).
** @return the new HOG object.
**

60
** The function creates a new HOG object to extract
descriptors of
** the prescribed @c variant. The angular resolution
is set by
** @a numOrientations, which specifies the number of
<em>undirected</em>
** orientations. The object can work with column
major images
** by setting @a transposed to true.
**/
VlHog *
vl_hog_new (VlHogVariant variant, vl_size
numOrientations, vl_bool transposed)
{
vl_index o, k ;
VlHog * self = vl_calloc(1, sizeof(VlHog)) ;
assert(numOrientations >= 1) ;
self->variant = variant ;
self->numOrientations = numOrientations ;
self->glyphSize = 21 ;
self->transposed = transposed ;
self->useBilinearOrientationAssigment = VL_FALSE ;
self->orientationX = vl_malloc(sizeof(float) *
self->numOrientations) ;
self->orientationY = vl_malloc(sizeof(float) *
self->numOrientations) ;
/*
Create a vector along the center of each
orientation bin. These
are used to map gradients to bins. If the image is
transposed,
then this can be adjusted here by swapping X and Y
in these
vectors.
*/
for(o = 0 ; o < (signed)self->numOrientations ; +
+o) {
double angle = o * VL_PI / self-
>numOrientations ;
if (!self->transposed) {
self->orientationX[o] = (float) cos(angle) ;
self->orientationY[o] = (float) sin(angle) ;
} else {
self->orientationX[o] = (float) sin(angle) ;
self->orientationY[o] = (float) cos(angle) ;
}

61
}
/*
If the number of orientation is equal to 9, one
gets:
Uoccti:: 18 directed orientations + 9 undirected
orientations + 4 texture
DalalTriggs:: 9 undirected orientations x 4
blocks.
*/
switch (self->variant) {
case VlHogVariantUoctti:
self->dimension = 3*self->numOrientations + 4 ;
break ;
case VlHogVariantDalalTriggs:
self->dimension = 4*self->numOrientations ;
break ;
default:
assert(0) ;
}
/*
A permutation specifies how to permute elements in
a HOG
descriptor to flip it horizontally. Since the
first orientation
of index 0 points to the right, this must be
swapped with orientation
self->numOrientation that points to the left (for
the directed case,
and to itself for the undirected one).
*/
self->permutation = vl_malloc(self->dimension *
sizeof(vl_index)) ;
for(o = 0 ; o < (signed)self->numOrientations ;
++o) {
vl_index op = self->numOrientations - o ;
self->permutation[o] = op ;
self->permutation[o + self->numOrientations]
= (op + self->numOrientations) % (2*self-
>numOrientations) ;
self->permutation[o + 2*self-
>numOrientations] = (op % self->numOrientations) +
2*self->numOrientations ;
}
for (k = 0 ; k < 4 ; ++k) {

62
/* The texture features correspond to four
displaced block around
a cell. These permute with a lr flip as for
DalalTriggs. */
vl_index blockx = k % 2 ;
vl_index blocky = k / 2 ;
vl_index q = (1 - blockx) + blocky * 2 ;
self->permutation[k + self->numOrientations *
3] = q + self->numOrientations * 3 ;
}
break ;
for(k = 0 ; k < 4 ; ++k) {
/* Find the corresponding block. Blocks are
listed in order 1,2,3,4,...
from left to right and top to bottom */
vl_index blockx = k % 2 ;
vl_index blocky = k / 2 ;
vl_index q = (1 - blockx) + blocky * 2 ;
for(o = 0 ; o < (signed)self->numOrientations
; ++o) {
vl_index op = self->numOrientations - o ;
self->permutation[o + k*self-
>numOrientations] = (op % self->numOrientations) +
q*self->numOrientations ;
}
}
break ;
default:
assert(0) ;
}
/*
Create glyphs for representing the HOG features/
filters. The glyphs
are simple bars, oriented orthogonally to the
gradients to represent
image edges. If the object is configured to work
on transposed image,
the glyphs images are also stored in column-major.
*/
self->glyphs = vl_calloc(self->glyphSize * self-
>glyphSize * self->numOrientations, sizeof(float)) ;
_61
#define atglyph(x,y,k) self->glyphs[(x) + self-
>glyphSize * (y) + self->glyphSize * self->glyphSize
* (k)]

63
for (o = 0 ; o < (signed)self->numOrientations ; +
+o) {
double angle = fmod(o * VL_PI / self-
>numOrientations + VL_PI/2, VL_PI) ;
double x2 = self->glyphSize * cos(angle) / 2 ;
double y2 = self->glyphSize * sin(angle) / 2 ;
if (angle <= VL_PI / 4 || angle >= VL_PI * 3 / 4)
{
/* along horizontal direction */
double slope = y2 / x2 ;
double offset = (1 - slope) * (self->glyphSize
- 1) / 2 ;
vl_index skip = (1 - fabs(cos(angle))) / 2 *
self->glyphSize ;
vl_index i, j ;
for (i = skip ; i < (signed)self->glyphSize -
skip ; ++i) {
j = vl_round_d(slope * i + offset) ;
if (! self->transposed) {
atglyph(i,j,o) = 1 ;
} else {
atglyph(j,i,o) = 1 ;
}
}
} else {
/* along vertical direction */
double slope = x2 / y2 ;
double offset = (1 - slope) * (self->glyphSize
- 1) / 2 ;
vl_index skip = (1 - sin(angle)) / 2 * self-
>glyphSize ;
vl_index i, j ;
for (j = skip ; j < (signed)self->glyphSize -
skip; ++j) {
i = vl_round_d(slope * j + offset) ;
if (! self->transposed) {
atglyph(i,j,o) = 1 ;
} else {
atglyph(j,i,o) = 1 ;
}
}
}
}
return self ;
}
/*

64
-----------------------------------------------------
----------- */
/** @brief Delete a HOG object
** @param self HOG object to delete.
**/
void
vl_hog_delete (VlHog * self)
{
if (self->orientationX) {
vl_free(self->orientationX) ;
self->orientationX = NULL ;
}
if (self->orientationY) {
vl_free(self->orientationY) ;
self->orientationY = NULL ;
}
if (self->glyphs) {
vl_free(self->glyphs) ;
self->glyphs = NULL ;
}
if (self->permutation) {
vl_free(self->permutation) ;
self->permutation = NULL ;
}
if (self->hog) {
vl_free(self->hog) ;
self->hog = NULL ;
}
if (self->hogNorm) {
vl_free(self->hogNorm) ;
self->hogNorm = NULL ;
}
vl_free(self) ;
}
/*
-----------------------------------------------------
----------- */
/** @brief Get HOG glyph size
** @param self HOG object.
** @return size (height and width) of a glyph.
**/
vl_size
vl_hog_get_glyph_size (VlHog const * self)
{
return self->glyphSize ;
}

65
/*
-----------------------------------------------------
----------- */
/** @brief Get HOG left-right flip permutation
** @return left-right permutation.
**
** The function returns a pointer to an array @c
permutation of ::vl_hog_get_dimension
** elements. Given a HOG descriptor (for a cell) @c
hog, which is also
** a vector of ::vl_hog_get_dimension elements, the
** descriptor obtained for the same image flipped
horizotnally is
** given by <code>flippedHog[i] =
hog[permutation[i]]</code>.
**/
vl_index const *
vl_hog_get_permutation (VlHog const * self)
{
return self->permutation ;
}
/*
-----------------------------------------------------
----------- */
/** @brief Turn bilinear interpolation of assignments
on or off
** @param x @c true if orientations should be
assigned with bilinear interpolation.
**/
void
vl_hog_set_use_bilinear_orientation_assignments
(VlHog * self, vl_bool x) {
self->useBilinearOrientationAssigment = x ;
}
/** @brief Tell whether assignments use bilinear
interpolation or not
** @return @c true if orientations are be assigned
with bilinear interpolation.
**/
vl_bool
vl_hog_get_use_bilinear_orientation_assignments
(VlHog const * self) {
return self->useBilinearOrientationAssigment ;

66
}
/*
-----------------------------------------------------
----------- */
/** @brief Render a HOG descriptor to a glyph image
** @param image glyph image (output).
** @param descriptor HOG descriptor.
** @param width HOG descriptor width.
** @param height HOG descriptor height.
**
** The function renders the HOG descriptor or filter
** @a descriptor as an image (for visualization) and
stores the result in
** the buffer @a image. This buffer
** must be an array of dimensions @c width*glyphSize
** by @c height*glyphSize elements, where @c
glyphSize is
** obtained from ::vl_hog_get_glyph_size and is the
size in pixels
** of the image element used to represent the
descriptor of one
** HOG cell.
**/
void
vl_hog_render (VlHog const * self,
float * image,
float const * descriptor,
vl_size width,
vl_size height)
{
vl_index x, y, k, cx, cy ;
vl_size hogStride = width * height ;
assert(self) ;
assert(image) ;
assert(descriptor) ;
assert(width > 0) ;
assert(height > 0) ;
for (y = 0 ; y < (signed)height ; ++y) {
for (x = 0 ; x < (signed)width ; ++x) {
float minWeight = 0 ;
float maxWeight = 0 ;
for (k = 0 ; k < (signed)self-
>numOrientations ; ++k) {
float weight ;
float const * glyph = self->glyphs + k *

67
(self->glyphSize*self->glyphSize) ;
float * glyphImage = image + self->glyphSize
* x + y * width * (self->glyphSize*self->glyphSize) ;
weight =
descriptor[k * hogStride] +
descriptor[(k + self->numOrientations) *
hogStride] +
descriptor[(k + 2 * self-
>numOrientations) * hogStride] ;
break ;
weight =
descriptor[k * hogStride] +
descriptor[(k + self->numOrientations) *
hogStride] +
>numOrientations) * hogStride] +
>numOrientations) * hogStride] ;
break ;
default:
abort() ;
}
maxWeight = VL_MAX(weight, maxWeight) ;
minWeight = VL_MIN(weight, minWeight);
for (cy = 0 ; cy < (signed)self->glyphSize ;
++cy) {
for (cx = 0 ; cx < (signed)self-
>glyphSize ; ++cx) {
*glyphImage++ += weight * (*glyph++) ;
}
glyphImage += (width - 1) * self->glyphSize
;
}
} /* next orientation */
{
float * glyphImage = image + self->glyphSize
* x + y * width * (self->glyphSize*self->glyphSize) ;
for (cy = 0 ; cy < (signed)self->glyphSize ;
++cy) {
for (cx = 0 ; cx < (signed)self-
>glyphSize ; ++cx) {
float value = *glyphImage ;
*glyphImage++ = VL_MAX(minWeight,

AcademicProject

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (17)

Semelhante a AcademicProject

Semelhante a AcademicProject (20)

AcademicProject