SlideShare uma empresa Scribd logo
1 de 97
Baixar para ler offline
VIDEO SHOT BOUNDARY DETECTION
B.Tech Project Report
BY
Anveshkumar Kolluri(1210711204)
DEPARTMENT OF INFORMATION TECHNOLOGY
GITAM INSTITUTE OF TECHNOLOGY
GITAM UNIVERSITY
VISAKHAPATNAM
530045 ,AP(INDIA)
April, 2015
CERTIFICATE
I hereby certify that the work which is being presented in the B.Tech
Major Project Report entitled “VIDEO SHOT BOUNDARY DETECTION”, in
partial fulfilment of the requirements of award of the Bachelor of Technology in
INFORMATION TECHNOLOGY and submitted to the Department of
Information Technology of GITAM Institute Of Technology, GITAM University,
Visakhapatnam, A.P is an academic record of my own work carried out during a
period from September 2014 to March 2015 under the supervision of Sri
D.Kishore Kumar, Assistant Professor, IT Department.
The matter presented in this thesis has not been submitted by me for the
award of any other degree elsewhere.
Signature of Candidate
Anveshkumar Kolluri
Rollno: 1210711204
This is to certify that the above statement made by the candidate is
correct to the best of my knowledge.
Signature of Supervisor
Date: Sri D.KishoreKumar,
Assistant Professor,
Project Supervisor
ACKNOWLEDGEMENT
We acknowledge the efforts of our guide Sri D.Kishore Kumar for his
very able guidance and immense help during the thick and thin of this project. His
inspiration and perfective criticism was extremely helpful in the successful working
of this project. We were continuously assessed and recorded on every module under
the guidance of Dr.GVS.Rajkumar and Mr.A.Naresh. There support has been
immeasurable.
We acknowledge the help and superlative presence of the Head of
Department of Information Technology, Dr. P V Lakshmi for the opportunities
she has given us not only during the project but throughout the course of B.Tech.
I am greatly indebted to Dr. K. Lakshmi Prasad, principal, GITAM Institute of
Technology, for providing facilities to carry out this work.
We also duly acknowledge all the faculty members of Information
Technology department for guiding us in the making of this project and for solving
our problems whenever they surfaced and also for catalyzing our thinking when it
seemed to stagnate.
Anvesh Kumar Kolluri (1210711204)
ABSTRACT
Indexing of digital videos in order to support browsing and retrieval by
users is a great challenge being faced today. Hence, to design a system that can
accurately and automatically process large amounts of heterogeneous videos, their
segmentation into shots and scene forms the basic operation, and for this operation
first shot boundaries have to be detected. Due to the enormous increase in videos
and the database sizes, as well as its vast deployment in various applications, the
need for a good retrieval application development arose. This project aims to
introduce an efficient and useful video processing and retrieval technique. This
report aims to present a detailed analysis of HOG(Histogram of Oriented Analysis)
in order to process the videos and retrieve relevant information. This method
utilizes the benefits of edge detection together with pixel intensity based technique
making a transformation in the size of pixel without much loss in their properties
and searching the relevant information. Moreover a new concept involving the
HOG method has been successfully applied to the shot boundary detection
problem.
CONTENTS
CHAPTER NAME PAGE NO
1 INTRODUCTION
1.1 MOTIVATION
1.2 FUNDAMENTALS OF SBD
1.2.1 THRESHOLDS
1.2.2 OBJECT OR CAMERA MOTION
1.2.3 FLASH LIGHTS
1.2.4 COMPLEXITY OF DETECTOR
1.3 PROBLEM FORMULATION
1
3
3
3
4
4
4
2 LITERATURE SURVEY
2.1 GRADIENTFIELD DESCRIPTOR FOR IMAGERETRIEVAL
2.2 FEATURE EXTRACTORS
2.2.1 EDGE HISTOGRAM DESCRIPTOR
2.2.2 SCALE INVARIANT FEATURE TRANSFORM
2.2.3 HISTOGRAM OF ORIENTED GRADIENTS
6
10
10
11
12
13
3 SYSTEM DESIGN
3.1 PROBLEM STATEMENT
3.2 EXISTING SYSTEMS
3.3 PROBLEM MOTIVATION
3.4 PROBLEM SOLUTION
3.5 SYSTEM REQUIREMENT ANALYSIS
3.6 SYSTEM REQUIREMENTS
15
15
16
16
17
18
4. MODULES
4.1 GUI MODULE
4.2 QUERY MODULE
4.3 FEATURE EXTRACTION MODULE
4.4 IMAGE RETRIEVAL MODULE
20
20
22
23
5. SELECTED SOFTWARE
5.1 MATLAB AND ITS TOOLBOX 24
5.2 FEATURES
5.2.1 COLOR
5.2.2 TEXTURE
5.2.3 SHAPE
30
30
32
36
6 .SYSTEM DESIGN
6.1 INTRODUTION
6.2 DATA FLOW DIAGRAM
38
38
7. IMPLEMENTATION
7.1 SAMPLE CODE
7.2 SAMPLE_GUI.M & SIMULATION RESULTS
42
45
8. SYSTEM TESTING
8.1 TESTING
8.2 TYPES OF TESTING
8.3 TEST CASES
49
49
52
9. CONCLUSIONS 53
10.REFERENCES
APPENDIX-I
APPENDIX-II
54
55
87
List of Abbrevations
SBD Shot Boundary Detection
SIFT Scale Invariant Feature Transform
EHD Edge Histogram Descriptor
HOG Histogram of Oriented Gradients
LT Local Threshold
GT Global Threshold
List of Figures
Figure 1 Sequence of frames 1
Figure 2 Types of histogram descriptors 11
Figure 3 Definition of images and sub-image
blocks
12
Figure 4 Hog blocks 14
Figure 5 An image and its histogram 31
Figure 6 Texture properties 32
Figure 7 Classical co-occurence matrix 34
Figure 8 Boundary-based & region-based 34
Figure 9 Data flow diagram of video shot
boundary detection
36
1
INTRODUCTION
1.1 Motivation
In this digital era, the recent developments in video compression
technology, widespread use of digital cameras, high capacity digital systems, along
with significant increase in computer performance have increased the usage and
availability of digital video, creating a need for tools that effectively categorize,
search and retrieve the relevant video material. Also, the increasing availability and
use of on-line videos has led to a demand for efficient and automated video analysis
techniques.
Generally, management of such large collections of videos requires
knowledge of the content of those videos. Therefore, digital video data is processed
with the objective of extracting the information about the content conveyed in the
video. Now, the definition of content is highly application dependent but there are a
number of commonalities in the application of content analysis. Among others, shot
boundary detection (SBD), also called temporal video segmentation is one of the
important and the most basic aspects of content based video retrieval. Hence, much
research has been focused on segmenting video by detecting the boundaries
between camera shots. We make use of HOG in order to extract the necessary
features across the shot boundary. A shot may be defined as a sequence of frames
captured by a single camera in a single continuous action in time and space.
FIGURE 1: SEQUENCE OF FRAMES
For example, a video sequence showing two people having a conversation may be
2
composed of several close-up shots of their faces which are interleaved and make
up a scene. Shots define the low-level, syntactical building blocks of a video
sequence.
A large number of different types of boundaries can exist between shots. These can
be broadly classified into abrupt changes or gradual changes. An abrupt transition is
basically a hard cut that occurs between two adjacent frames. In gradual transitions
we have fade, which is a gradual change in brightness, either starting or ending
with a black frame. Another type of gradual transition is dissolve which is similar
to a fade except that it occurs between two shots. The images of the first shot get
dimmer and those of the second shot get brighter until the second replaces the first.
Other types of shot transitions include wipes (gradual) and computer generated
effects such as morphing.
A scene is a logical grouping of shots into a semantic unit. A single scene focuses
on a certain object or objects of interest, but the shots constituting a scene can be
from different angles. In the example above the sequence of shots showing the
conversation would comprise one logical scene with the focus being the two people
and their conversation.
Scene boundary detection requires high level semantic understanding of the video
sequence and such an understanding must take cues such as the associated audio
track and the encoded data stream itself. Although, video segmentation is far more
desirable than simple shot boundary detection since people generally visualize a
video as a sequence of scenes not of shots, but shot boundary detection still plays a
vital role in any video segmentation system, as it provides the basic syntactic units
for higher level processes to build upon.
As part of an ongoing video indexing and browsing project, our recent research has
focused on the application of different methods of video segmentation to a large
and diverse digital video collection. The aim is to examine how different
segmentation methods perform on video. With this information, it is hoped to
develop a system capable of accurately segmenting a wide range of videos.
3
1.2 Fundamental Problems of SBD
The problem of shot boundary detection has been studied for more than
a decade and many published methods of detecting shot boundaries exist, several
challenges still exist which have been summarized in the upcoming sections:
1.2.1 Thresholds
Setting a threshold is one of the most challenging tasks for the
correct detection of a shot boundary. To decide whether a shot boundary has
occurred, it is necessary to set a threshold, or thresholds for measuring the
similarities or dissimilarities between start and end frame of a shot boundary. An
abrupt transition has a high discontinuity in adjacent frames, whereas the gradual
transition occurs over a number of shots. So the start frame and end frame for an
abrupt transition are adjacent frames but this will not be the case for gradual
transitions. This makes deciding a threshold more challenging, for gradual
transitions. Cosine dissimilarity values between the start and end frames if above
this threshold are logged as real shot boundaries, while values below this threshold
are ignored.To accurately segment broadcast video, it is necessary to balance
the following two - apparently conflicting points:
1 The need to prevent detection of false shot boundaries i.e. detecting boundaries
where none exist, by setting a sufficiently high threshold level so as to insulate the
detector from noise.
2 The need to detect subtle shot transitions such as dissolves, by making the
detector sensitive enough to recognise gradual change.
1.2.2 Object or Camera Motion
Visual content of the video changes significantly with the extreme object
or camera motion and screenplay effects (e.g. one turns on the light in a dark room)
very similar to the typical shot changes. Sometimes, slow motion cause content
change similar to gradual transitions, whereas extremely fast camera/object
movements cause content change similar to hard cuts. Therefore, it is difficult to
4
differentiate shot changes from the object or camera motion.
1.2.3 Flashlights
Color is the primary element of video content. Most of the video content
representations employ color as a feature. Continuity signals based on color feature
exhibit significant changes under abrupt illumination changes, such as flashlights.
Such a significant change might be identified as a content change (i.e. a shot
boundary) by most of the shot boundary detection tools. Several algorithms propose
using illumination invariant features, but these algorithms always face with a
tradeoff between using an illumination invariant feature and losing the most
significant feature in characterizing the variation of the visual content. Therefore,
flashlight detection is one of the major challenges in SBD algorithms.
1.2.4 Complexity of the Detector
Shot boundary detection is considered as a pre-processing step in most of
the video content analysis applications. There are high level algorithms which
perform more complex content analysis. Shot boundary detection results are used
by these high level analysis algorithms. Since the video content applications takes
most of the available computational power and time, it is necessary to keep the
computational complexity of the shot boundary detector low. Such a need
challenges for algorithms which are sufficiently precise but also computationally
inexpensive.
As the shot boundary detection problem evolved, in order to increase the
performance of the detection, proposed algorithms started to use more than one
feature for content representation. On the other hand, such a strategy brings a
computational burden on the detector, since each feature requires a separate
processing step.
1.3 Problem Formulation
Shot boundary detection is one of the basic, yet very important video
processing tasks, because the success of higher level processing relies heavily on
this step. Normally videos contain enormous amounts of data. For fast and efficient
5
processing and retrieval of these videos they need to be indexed properly, for which
shot boundary detection forms the basic process. This project aims at the evaluation
of the existing detection techniques for the shots in videos and working towards
their enhancement in order to improve the methods so that the problem discussed
above can be removed and the techniques can give better and more accurate results.
The conditions are met by doing the following two steps;
•
Feature Extraction: The first step in the process is extracting image features to a
distinguishable extent.
•
Matching: The second step involves matching these features to yield a result that
is visually similar.
6
2.LITERATURE SURVEY
This chapter concentrate on the work till done in respect of shot boundary
detection. John S. Boreczky et al [1996] proposed Comparison of video shot
boundary detection techniques and present a comparative analysis of various shot
boundary detection techniques and their variations including histograms, discrete
cosine transform, motion vector, and block matching methods. Patrick Bouthemy
et al [1999] proposed Unified Approach to Shot Change Detection and Camera
Motion Characterization which describes an approach to partition a video document
into shots by using image motion information, which is generally more intrinsic to
the video structure itself. A. Miene et al [2001] presented Advanced and Adaptive
Shot Boundary Detection techniques which is based on– feature extraction and shot
boundary detection. First, three different features for the measurement of shot
boundaries within the video are extracted. Second, detection of the shot boundaries
based on the previously extracted features.
H. Y: Mark Liaoff et al [2002] proposed a novel dissolve detection
algorithm which could avoid the mis-detection of motions by using binomial
distribution model to systematically determine the threshold needed for
discriminating a real dissolve from global or local motions. Jesús Bescós [2004],
proposed a Real-Time Shot Change Detection over Online MPEG-2 Video where it
describes a software module for video temporal segmentation which is able to
detect both abrupt transitions and all types of gradual transitions in real time.
Guillermo Cisneros et al [2005] proposed a paper on A Unified Model for
Techniques on Video-Shot Transition Detection. The approach presented here is
centred on mapping the space of inter-frame distances onto a new space of decision
better suited to achieving a sequence independent thresholding. Liuhong Liang et
al[2005] , presented an Enhanced Shot Boundary Detection Using Video Text
Information, in which a number of edge-based techniques have been proposed for
detecting abrupt shot boundaries to avoid the influence of flashlights common in
7
many video types, such as sports, news, entertainment and interviews videos.
Daniel DeMenthon et al [2006] proposed a paper on Shot boundary detection
based on Image correlation features in video. This paper is based on image
correlation features in the videos. The cut detection is based on the so-called 2max
ratio criterion in a sequential image buffer. The dissolve detection is based on the
skipping image difference and linearity error in a sequential image buffer. Kota
Iwamoto et al [2007] proposed Detection of wipes and digital video effects based
on a pattern-independent model of image boundary line characteristics which is
based on a new pattern independent model. These models rely on the characteristics
of image boundary lines dividing the two image regions in the transitional frames.
Jinhui Yuan et al[2008] proposed a paper on a Shot boundary detection
method for news video based on object segmentation and tracking. It combines
three main techniques: the partitioned histogram comparison method, the video
object segmentation and tracking based on wavelet analysis. The partitioned
histogram comparison is used as the first filter to effectively reduce the number of
video frames which need object segmentation and tracking. Yufeng Li et al [2008]
proposed paper on Novel Shot Detection Algorithm Based on Information Theory.
Firstly the features of color and texture are extracted by wavelet transform, then the
dissimilarity between two successive frames are defined which colligates the
mutual information of color feature and the co-occurrence mutual information of
texture feature. The threshold is adjusted adaptive based on the entropy of the
Continuous frames and it is not depend on the type of video and the kind of shot.
Vasileios T. Chasanis et al[2009] presented Scene detection in videos using shot
clustering and sequence alignment Foremost the key-frames are extracted using a
spectral clustering method employing the fast global k-means algorithm in the
clustering phase and also providing an estimate for the number of the key-frames.
Then shots are clustered into groups using only visual similarity as a feature and
they are labelled according to the group they are assigned. Jinchang Ren et al
[2009] proposed a paper on Shot Boundary Detection in MPEG Videos using Local
and Global Indicators operating directly in the compressed domain. Several local
8
indicators are extracted from MPEG macro blocks, and Ada Boost is employed for
feature selection and fusion. The selected features are then used in classifying
candidate cuts into five sub-spaces via pre-filtering and rule based decision making,
then the global indicators of frame similarity between boundary frames of cut
candidates are examined using phase correlation of dc images.
Priyadarshinee Adhikari et al [2009] proposed a paper on Video Shot
Boundary Detection. This paper presents video retrieval using shot boundary
detection. LihongXu et al [2010] proposed a paper on A Novel Shot Detection
Algorithm Based on Clustering. This paper present a novel shot boundary detection
algorithm based on K-means clustering. Colour feature extraction is done first and
then the dissimilarity of video frames is defined. The video frames are divided into
several different sub-clusters through performing K-means clustering. Wenzhu Xu
et al [2010] proposed a paper on A Novel Shot Detection Algorithm Based on
Graph Theory. This paper present a shot boundary detection algorithm based on
graph theory. The video frames are divided into several different groups through
performing graph-theoretical algorithm. Arturo Donate et al[2010] presented Shot
Boundary Detection in Videos Using Robust Three-Dimensional Tracking. The
proposal is to extract salient features from a video sequence and track them over
time in order to estimate shot boundaries within the video.
Min-Ho Park et al[2010] proposed a paper on Efficient Shot Boundary
Detection Using Block wise Motion Based Features. It is a measure of discontinuity
in camera and object/background motion is proposed for SBD based on the
combination of two motion features: the modified displaced frame difference
(DFD) and the block wise motion similarity. Goran J. Zajić et al [2011] proposed
a paper on Video shot boundary detection based on multifractal analysis. Low-level
features (color and texture features) are extracted from each frame in video
sequence then are concatenated in feature vectors (FVs) and stored in feature
matrix. Matrix rows correspond to FVs of frames from video sequence, while
columns are time series of particular FV component. Partha Pratim Mohanta et
al [2012], proposed a paper on A Model Based Shot Boundary Detection
9
Technique Using Frame Transition Parameters which is based on formulated frame
estimation scheme using the previous and the next frames. Pablo Toharia et
al[2012] proposed a paper on Shot boundary detection using Zernike moments in
multi-GPU multi-CPU architectures along with the different possible hybrid
combinations based on Zernike moments. Sandip T et al[2012] proposed a paper
on Key frame Based Video Summarization Using Automatic Threshold & Edge
Matching Rate. Firstly, the Histogram difference of every frame is calculated, and
then the edges of the candidate key frames are extracted by Prewitt operator. Ravi
Mishra et al[2013] proposed a paper on Video shot boundary detection using dual-
tree complex wavelet transform, an approach to process encoded video sequences
prior to complete decoding. The proposed algorithm first extracts structure features
from each video frame by using dualtree complex wavelet transform and then
spatial domain structure similarity is computed between adjacent frames. Zhe Ming
Lu et al [2013] present a Fast Video Shot Boundary Detection Based on SVD and
Pattern Matching. It is based on segment selection and singular value
decomposition (SVD). Initially, the positions of the shot boundaries and lengths of
gradual transitions are predicted using adaptive thresholds and most non-boundary
frames are discarded at the same time. Sowmya R et al [2013] proposed a paper on
Analysis and Verification of Video Summarization using Shot Boundary Detection.
The analysis is based on Block based Histogram difference and Block based
Euclidean distance difference for varying block sizes. Ravi Mishra et al[2014]
proposed a paper on a “Comparative study of block matching algorithm and dual
tree complex wavelet transform for shot detection in videos”. This paper presents a
comparison between the two detection methods in terms of various parameters like
false rate, hit rate, miss rate tested on a set of different video sequence.
10
2.1 GRADIENT FIELD DESCRIPTOR FOR VIDEO SHOT BOUNDARY
DETECTION
This system accepts the images as inputs of any format and
mapping itself into a defined dimensions in order to make it appropriate for
processing. This requires a matching process robust to depictive inaccuracy and
photometric variation. The approach is to transform database images into canny
edge maps and capture local structure in the map using a novel descriptor. Setting
an appropriate scale and hysteresis threshold for the canny operator by searching
the parameter space for a binary edge map in which a small, fixed percent of pixels
are confidential edging. These easy heuristics remove central boundaries and
discourages response at the scale of finer texture. Gradient Field HOG , an
adaptation of HOG that mitigates the lack of relative spatial information within
now by capturing structure from surrounding regions. We are inspired by work on
image completion capable of propagating image structure into voids and use a
similar Poisson filling approach to improve the richness of information in the
gradient field prior to sampling with the HOG descriptor. This simple technique
yields significant improvements in performance when matching features to photos,
compared to three leading descriptors: Self-Similarity Descriptor (SSIM) SIFT and
HOG. Furthermore we show how the descriptor can be applied to localise sketched
objects within the retrieved images, and demonstrate this functionality through the
features of the image montage application. The success of the descriptor is
dependent on correct selection of scale during edge extraction, and use of image
salience measures may benefit this process. The system could be enhanced by
exploring coloured sketches, or incorporate more flexible models.
2.2 FEATURE EXTRACTORS
Usually Feature extractors are divided into 3 types
1. EHD (EDGE HISTOGRAM DISCRIPTOR)
2. SIFT
3. HOG(HISTOGRAM OF ORIENTED GRADIENTS)
11
2.2.1 EHD (EDGE HISTOGRAM DESCRIPTOR)
In EHD method, it show how to made global and semi-global edge histogram bins
from the local histogram bins. Through various possible clusters of sub-images, in
semi-global histograms used 13 patterns.These 13 semiglobal regions
and the whole image space are adopted to define the semi-global and the global
histograms respectively. These extra histogram information can be obtained
directly from the local histogram bins without feature extraction process.
Experimental results show that the semi-global and global histograms generated
from the local histogram bins help to improve the retrieval performance.
FIGURE 2: TYPES OF HISTOGRAM DESCRIPTORS
Local Edge Histogram:
The normative part of the edge histogram descriptor consists of 80 local
edge histogram bins . The semantics of those histogram bins are described in the
following sub-sections. To localize edge distribution to a certain area of the image,
divide the image space into 4x4 sub-images as shown in Figure 1. Then, for each
sub-image, generate an edge histogram to represent edge distribution in the sub-
image. for define different edge types, the sub-image is further divided into small
square blocks called image-blocks
12
FIGURE 3:DEFINITION OF IMAGES AND SUB-IMAGE BLOCKS
Advantages of EHD:
1) Show how to construct global and semi global edge histogram bins from the
local histogram bins.
2) Used 13 patterns for the semi-global histograms.
3) Extra histogram information can be obtained directly from the local
histogram bins without feature extraction process.
4) Semi global and global histogram generated from the local
histogram bins help to improve the retrieval performance.
Disadvantages of EHD:
1) Not provide invariant opposite rotation, Scaling and translation.
2) The development is to difficult and robust descriptor is emphasized.
2.2.2 SIFT
The approach of SIFT feature detection which is used for object
recognition. The invariant features extracted from images can be used to perform
reliable matching between different views of an object or scene. The features have
been shown to be invariant to image rotation and scale and robust across a
substantial range of affine distortion, addition of noise, and change in illumination.
For each an image query, gallery photograph, and each sketch/photo
correspondence in our dictionary, we compute a SIFT feature representation. SIFT
based object matching is a popular method for finding correspondences between
images. Introduced by Lowe, SIFT object matching consists of both a scale
13
invariant interest point detector as well as a feature-based similarity measure. Our
method is not concerned with the interest point detector as well as a feature based
similarity measure. Our method is not concerned with the interest point detection
aspect of the SIFT framework, but instead utilizes only the gradient-based feature
descriptors (known as SIFT-features)
Advantages of SIFT:
1) Perform reliable matching between different views of an object or scene
2) Image rotation and scale and robust across a substantial range
of affine distortion, addition of noise, and change in illumination.
3) The image gradient magnitudes orientations are sampled around
the key point location
Disadvantages of SIFT:
1) Edges are poorly defined and usually hard to detect, but there are
still large numbers of key points can be extracted from typical images.
2) Perform the feature matching even the faces are small
2.2.3 HISTOGRAM OF ORIENTED GRADIENTS
The essential thought behind the Histogram of Oriented Gradient
descriptors is that local object appearance and shape within an image can be
described by the distribution of intensity gradients or edge directions. The
implementation of these descriptors can be achieved by dividing the image into
small connected regions, called cells, and for each cell compiling a histogram of
gradient directions or edge orientations for the pixels within the cell. The
combination of these histograms then represents the descriptor. For improved
accuracy, the local histograms can be contrast-normalized by calculating a measure
of the intensity across a larger region of the image, called a block, and then using
this value to normalize all cells within the block. This normalization results in
14
better invariance to changes in illumination or shadowing
FIGURE 4: HOG BLOCKS
Advantages:
It is examine for more database The HOG in more cases it is much
better than the EHD based retrieval. The edge Histogram descriptor not mainly
look better for information poor sketch, while other case show better result can be
achieve for more detailed this problem can be overcome by the Hog method.It
Capture edge or gradient structure that is very characteristic of local shape.
Disadvantages:
Working on HOG-based detectors that incorporate motion
information using block matching or optical flow fields. Finally, although the
current fixed-template-style detector has proven difficult to beat for fully visible
pedestrians, humans are highly articulated and we believe that including a parts
based model with a greater degree of local spatial invariance
After reviewing existing edge and gradient based descriptors, we show
experimentally that grids of histograms of oriented gradient (HOG) descriptors
significantly outperform existing feature sets for human detection. We study the
influence of each stage of the computation on performance, concluding that fine-
scale gradients, fine orientation binning, relatively coarse spatial binning, and high-
quality local contrast normalisation in overlapping descriptor blocks are all
important for good results
15
3. SYSTEM ANALYSIS
3.1 PROBLEM STATEMENT
The problem involves entering an image as a query and selecting
the pre-judged video so which in turn provides database space by breaking down
into shots and frames. A software application that is designed to employ SBD
techniques in extracting visual properties, and matching them. This is done to
retrieve images in the database that are visually similar to the query image. The
main problem with this application is the size and shape of the input image which
needs to get matched to the size and shape of the database.
3.2 EXISTING SYSTEMS
Several systems currently exist, and are being constantly developed. Examples are
1) QBIC or Query By Image was developed by IBM, Almaden Research Centre,
to allow users to graphically pose and refine queries based on multiple visual
properties such as colour, texture and shape . It supports queries based on input
images, user-constructed sketches, and selected colour and texture patterns
2) VIR Image Engine by Virage Inc., like QBIC, enables image retrieval based on
primitive attributes such as colour, texture and structure. It examines the pixels in
the image and performs an analysis process, deriving image characterisation
features
3) Visual SEEK and Web SEEK were developed by the Department of Electrical
Engineering, Columbia University. Both these systems support colour and spatial
location matching as well as texture matching NeTra was developed by the
Department of Electrical and Computer Engineering, University of California. It
supports colour, shape, spatial layout and texture matching, as well as image
segmentation
16
4) MARS or Multimedia Analysis and Retrieval System was developed by the
Beckman Institute for Advanced Science and Technology, University of Illinois. It
supports colour, spatial layout, texture and shape matching
3.3 Problem Motivation
Video databases and collections can be enormous in size, containing
hundreds, thousands or even millions of frames and shots that corresponding to
each video respectively. The conventional method of image retrieval in a video is
searching for a key features and that would match the descriptive keyword assigned
to the image by a human categoriser
While computationally expensive, the results are far more accurate than
conventional image indexing. Hence, there exists a trade off between accuracy and
computational cost. This trade off decreases as more efficient algorithms are
utilised and increased computational power becomes inexpensive.
3.4. Proposed Solution
The solution initially proposed was to extract the primitive features of a
query image and compare them to those of database images. The image features
under consideration was over varying features. Thus, using matching and
comparison algorithms, the features of one image are compared and matched to the
corresponding features of another image which is generally frame of the video
under consideration in our case. This comparison is performed using shape distance
metrics. In the end, these metrics are performed one after another, so as to retrieve
database images that are similar to the query. The similarity between features was
to be calculated using algorithms used by well known SBD systems.
17
3.5 SOFTWARE REQUIREMENT ANALYSIS
A Software Requirements Specification (SRS) is a complete description of
the behaviour of the system to be developed. It includes a set of use cases that
describe all the interactions the users will have with the software. Use cases are also
known as functional requirements. In addition to use cases, the SRS also contains
non-functional (or supplementary) requirements. Non-functional requirements are
requirements which impose constraints on the design or implementation (such as
performance engineering requirements, quality standards, or design constraints).
Functional Requirements:
In software engineering, a functional requirement defines a
function of a software system or its component. A function is described as a set of
inputs, the behaviour, and outputs. Functional requirements may be calculations,
technical details, data manipulation and processing and other specific functionality
that define what a system is supposed to accomplish. Behavioural requirements
describing all the cases where the system uses the functional requirements are
captured in use cases. Functional requirements are supported by non-functional
requirements (such as performance requirements, security, or reliability). How a
system implements functional requirements is detailed in the system design. In
some cases a requirements analyst generates use cases after gathering and
validating a set of functional requirements. Each use case illustrates behavioural
scenarios through one or more functional requirements. Often, though, an analyst
will begin by eliciting a set of use cases, from which the analyst can derive the
functional requirements that must be implemented to allow a user to perform each
use case.
Non-Functional Requirements: In systems engineering and requirements
engineering, a non-functional requirement is a requirement that specifies criteria
18
that can be used to judge the operation of a system, rather than specific behaviours.
This should be contrasted with functional requirements that define specific
behaviour or functions In general; functional requirements define what a system is
supposed to do whereas non-functional requirements define how a system is
supposed to be. Non-functional requirements are often called qualities of a system.
Other terms for non-functional requirements are "constraints", "quality attributes",
“quality goals" and "quality of service requirements," and "non-behavioural
requirements. “Qualities, that is, non-functional requirements, can be divided into
two main categories:
1. Execution qualities, such as security and usability, which are observable at run
time.
2. Evolution qualities, such as testability, maintainability, extensibility and
scalability, which are embodied in the static structure of the software system.
3.6 System Requirements
Introduction:
To be used efficiently, all computer software needs certain
hardware components or other software resources to be present on a computer.
These pre-requisites are known as (computer) system requirements and are often
used as a guideline as opposed to an absolute rule. Most software defines two sets
of system requirements: minimum and recommended. With increasing demand for
higher processing power and resources in newer versions of software, system
requirements tend to increase over time. Industry analysts suggest that this trend
plays a bigger part in driving upgrades to existing computer systems than
technological advancements.
19
Hardware Requirements:
The most common set of requirements defined by any operating
system or software application is the physical computer resources, also known as
hardware, a hardware requirements list is often accompanied by a hardware
compatibility list (HCL), especially in case of operating systems. An HCL lists
tested, compatible, and sometimes incompatible hardware devices for a particular
operating system or application. The following sub-sections discuss the various
aspects of hardware requirements.
Hardware Requirements for Present Project
1.Input Devices: Keyboard and Mouse RAM(512 MB)
2.Processor: P4 or above
3.Storage: Less than 100 GB of HDD space.
Software Requirements:
Software Requirements deal with defining software resource requirements
and pre-requisites that need to be installed on a computer to provide optimal
functioning of an application. These requirements or pre-requisites are generally
not included in the software installation package and need to be installed separately
before the software is installed.
Supported Operating Systems:
1.Windows xp, Windows 7,windows 8,windows 8.1
2. Linux all versions

3. OSX Mountain Lion and above.
4. MATLAB_R2013a
20
4 MODULES
The project has 3 main modules which are used for image retrieval from
videos. The modules are as follows
4.1 GUI Module
4.2 Query module
I Querying Input Image
II Querying Input Video
the following have different modules again which are described shortly.
4.1 GUI MODULE
This is the main module in the point of User prospective. The input image
and the corresponding output images are displayed over here and all the interaction
performed by the user is monitored graphically in this module
4.2 QUERY MODULE
During the query module , User inputs a query image in the application,
now we need to change the size and shape of the query image to that of the
database images which are predefined earlier. Query module can be sub divided
into 2 modules
I . Querying Input Image
II. Querying Input Video
The images and videos can be processed as follows,
I. Querying Input Image
The input image should satisfy certain criteria in order to make the
matching and comparison valid. They are as follows;
Image Enhancement
Image enhancement is conversion of the original imagery to a better
understandable level in spectral quality for feature extraction or image
interpretation. It is useful to examine the image histograms before performing any
image enhancement. The x-axis of the histogram is the range of the available digital
21
numbers, i.e. 0 to 255 (in case of grey level). The y-axis is the number of pixels in
the image having a given digital number
Contrast stretching to increase the tonal distinction between various features in a
scene. The most common types of contrast stretching enhancement are: a linear
contrast stretch, a linear contrast stretch with saturation, a histogram-equalized
stretch.
Filtering is commonly used to restore an image by avoiding noises to enhance the
image for better interpretation and to extract features such as edges. The most
common types of filters used are: mean, median, low pass, high pass, edge
detection.
Image Transformation
Image transformations usually involve combined processing of data from
multiple spectral bands. Arithmetic operations (i.e. subtraction, addition,
multiplication, division) are performed to combine and transform the original bands
into "new" images which better display or highlight certain features in the scene.
Some of the most common transforms applied to image data are: image rationing:
this method involves the differencing of combinations of two or more bands aimed
at enhancing target features or principal components analysis (PCA). The objective
of this transformation is to reduce the dimensionality (i.e. the number of bands) in
the data, and compress as much of the information in the original bands into fewer
bands.
Image Classification
Information extraction is the last step toward the final output of the image
analysis. After pre-processing the data is subjected to quantitative analysis to assign
individual pixels to specific classes. Classification of the image is based on the
known and unknown identity to classify the remainder of the image consisting of
those pixels of unknown identity. After classification is complete, it is necessary to
evaluate its accuracy by comparing the categories on the classified images with the
22
areas of known identity on the ground. The final result of the analysis provides the
user with full information concerning the source data, the method of analysis and
the outcome and its reliability.
Applications of Image Processing:
Interest in digital image processing methods stems from two principal application
areas:
(1) Improvement of pictorial information for human interpretation, and
(2) Processing of scene data for autonomous machine perception.
In the second application area, interest focuses on procedures for extracting
information from an image in a form suitable for computer processing. Examples
include automatic character recognition, industrial machine vision for product
assembly and inspection, military recognizance, automatic processing of
fingerprints etc.
Basics of video processing:
Video is the technology of electronically capturing, recording, processing,
storing, transmitting, and reconstructing a sequence of still images representing
scenes in motion. Essentially the time component is considered in the case of
videos. It may be further described in the following manner:
1 Video is a sequence of still images representing scenes in motion.
2 Video is a motion technology of "electronically" done.
3 Video will be "capturing, recording, processing, storing, transmitting, and
reconstructing" all done electrically.
4.3 FEATURE EXTRACTION
Here in feature extraction we have used a efficient feature extraction
technique called as Histogram of Oriented Gradients. which retrieves the query
image’s color ,shape and texture features and compare them to that of the database
images.
23
4.4 IMAGE RETRIEVAL
During this module of the project, the features are getting compared and
sorted out. We have used a an efficient algorithm called euclidean distance which is
used to compare the features. The feature vectors are then sorted and indexing
mechanisms are used to retrieve the image from the databases and then they are
displayed accordingly on the GUI.
24
5. SELECTED SOFTWARE
The software used to perform different operations on different kinds of
images and their properties is a software called as MATLAB. The overview of
Matlab and its operations over different images and their features are illustrated
with examples in the below description as follows
5.1 MATLAB
The name ‘Matlab’ comes from two words: matrix and laboratory.
According to The MathWorks (producer of Matlab), Matlab is a technical
computing language used mostly for high-performance numeric calculations and
visualization. It integrates computing, programming, signal processing and graphics
in easy to use environment, in which problems and solutions can be expressed with
mathematical notation. Basic data element is an array, which allows for computing
difficult mathematical formulas, which can be found mostly in linear algebra. But
Matlab is not only about math problems. It can be widely used to analyze data,
modeling, simulation and statistics. Matlab high-level programming language finds
implementation in other fields of science like biology, chemistry, economics,
medicine and many more. In the following paragraph which is fully based on the
MathWorks, ‘Getting started with Matlab’, I introduce the main features of the
Matlab.Most important feature of Matlab is easy extensibility. This environment
allows creating new applications and becoming contributing author. It has evolved
over many years and became a tool for research, development and analysis. Matlab
also features set of specific libraries, called toolboxes. They are collecting ready to
use functions, used to solve particular areas of problems. Matlab System consist
five main parts. First, Desktop Tools and Development Environment are set of tools
helpful while working with functions and files. Examples of this part can be
25
command window, the workspace, notepad editor and very extensive help
mechanism. Second part is The Matlab Mathematical Function Library. This is a
wide collection of elementary functions like sum, multiplication, sine, cosine,
tangent, etc. Besides simple operations, more complex arithmetic can be calculated,
including matrix inverses, Fourier transformations and approximation functions.
Third part is the Matlab language, which is high-level array language with
functions, data structures and object-oriented programming features. It allows
programming small applications as well as large and complex programs. Fourth
piece of Matlab System is its graphics. It has wide tools for displaying graphs and
functions. It contains two and three-dimensional visualization, image processing,
building graphic user interface and even animation. Fifth and last part is Matlab’s
External Interfaces. This library gives a green light for writing C and Fortran
programs, which can be read and connected with Matlab.
Data representation :
Data representation in Matlab is the feature that distinguishes this
environment from others. Everything is presented with matrixes. The definition of
matrix by MathWorks is a rectangular array of numbers. Matlab recognizes binary
and text files. There is couple of file extensions that are commonly used, for
example *.m stands for M-file. There are two kinds of it: script and function M-file.
Script file contains sequence of mathematical expressions and commands. Function
type file starts with word function and includes functions created by the user.
Different example of extension is *.mat. Files *.mat are binary and include work
saved with command File/Save or Save as. Since Matlab stores all data in matrixes,
program offers many ways to create them.
The easiest one is just to type values. There are three general rules:
• the elements of a row should be separated with spaces or commas;
26
• to mark the end of each row a semicolon ‘;’ should be used;
• square brackets must surround whole list of elements.
After entering the values matrix is automatically stored in the workspace
(MathWorks, 2002, chapter 3.3). To take out specific row, round brackets are
required. In the 3x3 matrix, pointing out second row would be (2,:) and third
column (:,3). In order to recall one precise element bracket need to contain two
values. For example (2,3) stands for third element in the second row. Variables are
declared as in every other programming language. Also arithmetic operators are
presented in the same way – certain value is assigned to variable. When the result
variable is not defined, Matlab creates one, named Ans, placed in the workspace.
Ans stores the result of last operation. One command worth mentioning is plot
command. It is responsible for drawing two dimensional graphs. Although this
command belongs to the group liable for graphics, it is command from basic
Matlab instructions, not from Image Processing toolbox. It is not suitable for
processing images, therefore it will not be described. Last paragraph considers
matrixes as two-dimensional structures. For better understanding how Matlab stores
images, three dimensional matrixes have to be explained. In three dimensional
matrixes there are three values in the brackets. First value stands for number of
row, second value means column and third one is the extra dimension. Similarly,
fourth number would go as fourth dimension, etc. The best way to understand it, is
to look at Figure 3, which presents the method of pointing each element in this
three dimensional matrix. Figure 3. The example of three dimensional matrix, built
in Matlab (Ozimek, lectures from Digital image processing, 2010a) As mentioned
before, Matlab stores images in arrays, which naturally suit to the representation of
images. Most pictures are kept in two-dimensional matrices. Each element
corresponds to one pixel in the image. For example image of 600 pixels height and
27
800 pixels width would be stored in Matlab as a matrix in size 600 rows and 800
columns. More complicated images are stored in three-dimensional matrices.
Truecolor pictures require the third dimension, to keep their information about
intensities of RGB colors. They vary between 0 and 1 value (MathWorks, 2009,
2.12). The most convenient way of pointing locations in the image, is pixel
coordinate system. To refer to one specific pixel, Matlab requires number of row
and column that stand for sought point. Values of coordinates range between one
and the length of the row or column. Images can also be expressed in spatial system
coordinates. In that case positions of pixel are described as x and y. By default,
spatial coordinates correspond with pixel coordinates. For example pixel (2,3)
would be translated to x=3 and y=2. The order of coordinates is reversed
(Koprowski & Wróbel, 2008, 20-21).
Endless possibilities :
As mentioned earlier, Matlab offers very wide selection of toolboxes.
Most of them are created by Mathworks but some are made by advanced users.
There is a long list of possibilities that this program gives. Starting from
automation, through electrical engineering, mechanics, robotics, measurements,
modeling and simulation, medicine, music and all kinds of calculations. Next
couple of paragraphs will shortly present some toolboxes available in Matlab. The
descriptions are based on the theory from Mrozek&Mrozek (2001, 387 – 395)
about toolboxes and mathworks.com. Very important group of toolboxes are
handling with digital signal processing. Communication Toolbox provides
mechanisms for modeling, simulation, designing and analysis of functions for the
physical layer of communication systems. This toolbox includes algorithms that
help with coding channels, modulation, demodulation and multiplexing of digital
28
signals. Communication toolbox also contains graphical user interface and plot
function for better understanding the signal processing. Similarly, Signal
Processing Toolbox, deals with signals. Possibilities of this Matlab library are
speech and audio processing, wireless and wired communications and analog filter
designing. Another group is math and optimization toolboxes. Two most common
are Optimization and Symbolic Math toolboxes. The first one handles large-scale
optimization problems. It contains functions responsible for performing nonlinear
equations and methods for solving quadratic and linear problems. More used library
is the second one. Symbolic Math toolbox contains hundreds of functions ready to
use when it comes to differentiation, integration, simplification, transforms and
solving of equations. It helps with all algebra and calculus calculations. Small
group of Matlab toolbox handles statistics and data analysis. Statistics toolbox
features are data management and organization, statistical drawing, probability
computing and visualization. It also allows designing experiments connected with
statistic data. Financial Toolbox is an extension to previously mentioned library.
Like the name states, this addition to Matlab handles finances. It is widely used to
estimate economical risk, analyze interest rate and creating financial charts. It can
also work with evaluation and interpretation of stock exchange actions. Neural
Networks Toolbox can be considered as one of the data analyzing library. It has set
of functions that create, visualize and simulate neural networks. It is helpful when
data change nonlinearly. Moreover, it provides graphical user interface equipped
with trainings and examples for better understanding the way neural network
works. Some toolboxes do not belong to any specific group but they are worth
mentioning. For example Fuzzy Logic Toolbox offers wide range of functions
responsible for fuzzy calculations. It allows user to look through the results of
29
fuzzy computations. Matlab provides also very useful connection to databases
through Database Toolbox.
It allows analyzing and processing the information stored in the tables. It supports
SQL (Structured Query Language) commands to read and write data, and to create
simple queries to search through the information. This specific toolbox interacts
with Oracle and other database processing programs. And what is most important,
Database Toolbox allows beginner users, not familiar with SQL, to access and
query databases. Last but not least, very important set of libraries – image
processing toolboxes. Mapping Toolbox is one of them, which is responsible for
analyzing geographic data and creating maps. It provides compatibility for raster
and vector graphics which can be imported. Additionally, as well two-dimensional
and three-dimensional maps can be displayed and customized. It also helps with
navigation problems and digital terrain analysis. Image Acquisition Toolbox is a
very valuable collection of functions that handles receiving image and video signal
directly from computer to the Matlab environment. This toolbox recognizes video
cameras from multiple hardware vendors. Specially designed interface leads
through possible transformations of images and videos, acquired thanks to
mechanisms of Image Acquisition Toolbox. Image Processing Toolbox is a wide
set of functions and algorithms that deal with graphics. It supports almost any type
of image file. It gives the user unlimited options for pre- and post- processing of
pictures. There are functions responsible for image enhancement, deblurring,
filtering, noise reduction, spatial transformations, creating histograms, changing the
threshold, hue and saturation, also for adjustment of color balance, contrast,
detection of objects and analysis of shapes.
30
5.2 Features
5.2.1 Colour
Definition
One of the most important features that make possible the recognition of
images by humans is colour. Colour is a property that depends on the reflection of
light to the eye and the processing of that information in the brain. We use colour
everyday to tell the difference between objects, places, and the time of day. Usually
colours are defined in three dimensional colour spaces. These could either be RGB
(Red, Green, and Blue),
HSV (Hue, Saturation, and Value) or HSB (Hue, Saturation, and Brightness). The
last two are dependent on the human perception of hue, saturation, and brightness.
Most image formats such as JPEG, BMP, GIF, use the RGB colour space to store
information. The RGB colour space is defined as a unit cube with red, green, and
blue axes. Thus, a vector with three co-ordinates represents the colour in this space.
When all three coordinates are set to zero the colour perceived is black. When all
three coordinates are set to 1 the colour perceived is white. The other colour spaces
operate in a similar fashion but with a different perception.
Methods of Representation
The main method of representing colour information of images in any
system is through colour histograms. A colour histogram is a type of bar graph,
where each bar represents a particular colour of the colour space being used. In
MatLab for example you can get a colour histogram of an image in the RGB or
HSV colour space. The bars in a colour histogram are referred to as bins and they
represent the x-axis. The number of bins depends on the number of colours there
are in an image. The y-axis denotes the number of pixels there are in each bin. In
other words how many pixels in an image are of a particular colour.
31
An example of a colour histogram in the HSV colour space can be seen
with the following image:
FIG 5: AN IMAGE AND ITS HISTOGRAM
To view a histogram numerically one has to look at the colour map or the numeric
representation of each bin. As one can see from the colour map each row represents
the colour of a bin. The row is composed of the three coordinates of the colour
space. The first coordinate represents hue, the second saturation, and the third,
value, thereby giving HSV. The percentages of each of these coordinates are what
make up the colour of a bin. Also one can see the corresponding pixel numbers for
each bin, which are denoted by the blue lines in the histogram.
Quantization in terms of colour histograms refers to the process of reducing the
number of bins by taking colours that are very similar to each other and putting
them in the same bin. By default the maximum number of bins one can obtain using
the histogram function in MatLab is 256. For the purpose of saving time when
trying to compare colour histograms, one can quantize the number of bins.
Obviously quantization reduces the information regarding the images but as was
mentioned this is the tradeoff when one wants to reduce processing time.There are
two types of colour histograms, Global colour histograms (GCHs) and Local colour
histograms (LCHs). A GCH represents one whole image with a single colour
histogram. An LCH divides an image into fixed blocks and takes the colour
histogram of each of those blocks. LCHs contain more information about an image
but are computationally expensive when comparing images. “The GCH is the
traditional method for colour based image retrieval. However, it does not include
32
information concerning the colour distribution of the regions” of an image. Thus
when comparing GCHs one might not always get a proper result in terms of
similarity of images.
5.2.2Texture
Definition
Texture is that innate property of all surfaces that describes visual patterns,
each having properties of homogeneity. It contains important information about the
structural arrangement of the surface, such as; clouds, leaves, bricks, fabric, etc. It
also describes the relationship of the surface to the surrounding environment. In
short, it is a feature that describes the distinctive physical composition of a surface.
Texture properties include:
1) Coarseness
2) Contrast
3) Directionality
4) Line-likeness
5) Regularity
FIGURE 6: TEXTURE PROPERTIES
Texture is one of the most important defining features of an image. It is
characterised by the spatial distribution of gray levels in a neighbourhood. In order
to capture the spatial dependence of gray-level values, which contribute to the
perception of texture, a two-dimensional dependence texture analysis matrix is
taken into consideration. This two-
dimensional matrix is obtained by decoding the image file; jpeg, bmp, etc.
Methods of Representation
There are three principal approaches used to describe texture; statistical, structural
33
and spectral…
•
Statistical techniques characterise textures using the statistical properties of the
grey levels of the points/pixels comprising a surface image. Typically, these
properties are computed using: the grey level co-occurrence matrix of the
surface, or the wavelet transformation of the surface.
•
Structural techniques characterise textures as being composed of simple
primitive structures called “texels” (or texture elements in Figure 6). These are
arranged regularly on a surface according to some surface arrangement rules.
•
Spectral techniques are based on properties of the Fourier spectrum and
describe global periodicity of the grey levels of a surface by identifying high-
energy peaks in the Fourier spectrum
For optimum classification purposes, what concern us are the statistical techniques
of characterisation. This is because it is these techniques that result in computing
texture properties.
The most popular statistical representations of texture are:
•
Co-occurrence Matrix
•
Tamura Texture
•
Wavelet Transform
Co-occurrence Matrix
Originally proposed by R.M. Haralick, the co-occurrence matrix
representation of texture features explores the grey level spatial dependence of
texture. A mathematical definition of the co-occurrence matrix is as follows:
- Given a position operator P(i,j),
- let A be an n x n matrix
- whose element A[i][j] is the number of times that points with grey level (intensity)
g[i] occur, in the position specified by P, relative to points with grey level g[j].
- Let C be the n x n matrix that is produced by dividing A with the total number
of point pairs that satisfy P. C[i][j] is a measure of the joint probability that a pair
of points satisfying P will have values g[i], g[j].
- C is called a co-occurrence matrix defined by P.
34
FIGURE 7: CLASSICAL CO-OCCURENCE MATRIX
At first the co-occurrence matrix is constructed, based on the orientation and
distance between image pixels as shown in Figure 7. Then meaningful statistics are
extracted from the matrix as the texture representation.
Haralick proposed the following texture features:
1. Angular Second Moment
2. Contrast
3. Correlation
4. Variance
5. Inverse Second Differential Moment
6. Sum Average
7. Sum Variance
8. Sum Entropy
9. Entropy
10. Difference Variance
11.Difference Entropy
12.Measure of Correlation 1
13.Measure of Correlation 2
14.Local Mean
Hence, for each Haralick texture feature, we obtain a co-occurrence matrix. These
35
co-occurrence matrices represent the spatial distribution and the dependence of the
grey levels within a local area. Each (i,j) th
entry in the matrices, represents the
probability of going from one pixel with a grey level of 'i' to another with a grey
level of 'j' under a predefined distance and angle. From these matrices, sets of
statistical measures are computed, called feature vectors.
Tamura Texture
By observing psychological studies in the human visual perception, Tamura
explored the texture representation using computational approximations to the three
main texture features of: coarseness, contrast, and directionality. Each of these
texture features are approximately computed using algorithms…
•
Coarseness is the measure of granularity of an image, or average size of regions
that have the same intensity.
•
Contrast is the measure of vividness of the texture pattern. Therefore, the bigger the
blocks that make up the image, the higher the contrast. It is affected by the use of
varying black and white intensities.
•
Directionality is the measure of directions of the grey values within the image.
Wavelet Transform
Textures can be modeled as quasi-periodic patterns with spatial/frequency
representation. The wavelet transform transforms the image into a multi-scale
representation with both spatial and frequency characteristics. This allows for
effective multi-scale image analysis with lower computational cost. According to
this transformation, a function, which can represent an image, a curve, signal etc.,
can be described in terms of a coarse level description in addition to others with
details that range from broad to narrow scales. Unlike the usage of sine functions to
represent signals in Fourier transforms, in wavelet transform, we use functions
known as wavelets. Wavelets are finite in time, yet the average value of a wavelet
36
is zero. In a sense, a wavelet is a waveform that is bounded in both frequency and
duration. While the Fourier transform converts a signal into a continuous series of
sine waves, each of which is of constant frequency and amplitude and of infinite
duration, most real-world signals (such as music or images) have a finite duration
and abrupt changes in frequency. This accounts for the efficiency of wavelet
transforms. This is because wavelet transforms convert a signal into a series of
wavelets, which can be stored more efficiently due to finite time, and can be
constructed with rough edges, thereby better approximating real-world signals.
Examples of wavelets are Coiflet, Morlet, Mexican Hat, Haar and Daubechies. Of
these, Haar is the simplest and most widely used, while Daubechies have fractal
structures and are vital for current wavelet applications. These two are outlined
below:
5.2.3 Shape
Definition
Shape may be defined as the characteristic surface configuration of
an object; an outline or contour. It permits an object to be distinguished from its
surroundings by its outline [Figure 8]. Shape representations can be generally
divided into two categories:
•
Boundary-based, and
•
Region-based.
FIGURE 8: BOUNDARY-BASED & REGION-BASED
Boundary-based shape representation only uses the outer boundary of the shape.
37
This is done by describing the considered region using its external characteristics;
i.e., the pixels along the object boundary. Region-based shape representation uses
the entire shape region by describing the considered region using its internal
characteristics; i.e., the pixels contained in that region.
Methods of Representation
For representing shape features mathematically, we have:
Boundary-based:
1) Polygonal Models, boundary partitioning
2) Fourier Descriptors
3) Splines, higher order constructs
4) Curvature Models
Region-based:
1) Superquadrics
2) Fourier Descriptors
3) Implicit Polynomials
4) Blum's skeletons
The most successful representations for shape categories are Fourier Descriptor and
Moment Invariants:
1) The main idea of Fourier Descriptor is to use the Fourier transformed boundary
as the shape feature.
2) The main idea of Moment invariants is to use region-based moments, which are
invariant to transformations as the shape feature.
38
6. SYSTEM DESGIN
6.1 Introduction
System design is the process of defining the component, modules, interfaces and
data for a system to satisfy specified requirements. This is usually done in 2 in
ways
I. UML DIAGRAMS II. DATAFLOW DIAGRAMS
6.2 Data Flow Diagram
The Data Flow Diagram (DFD) is a graphical representation of the flow of data
through an information system. It enables you to represent the processes in your
information system from the viewpoint of data
FIGURE 9: DATA FLOW DIAGRAM OF VIDEO SHOT BOUNDARY DETECTION
39
7.IMPLEMENTATION
CREATING A GUI:-
A Graphical User-Interface(GUI) is an essential part where a user can easily navigate
through the application and its where about. It has all the essential parts which responds
as per the request made by the user. Each element arranged in the GUI has an axis in a
coordinate system and all these elements are given action specific code in a file that
corresponds to matlab file(.M file). For creating an graphical user interface we use
matlab’s graphical user interface tool box for creation and element positioning.
We start with ‘guide’ command in the command window, which opens the tool box for
matlab GUI creation. Here, we specify the name of the GUI we are going to create and
matlab opens us a GUI building module. Firstly, the title of the project 'VIDEO SHOT
BOUNDARY DETECTION' is placed at the centre as shown in the below figure.
This is done using the static tool present in the toolbox. By double clicking the text
field generated we can edit the features of the text field like changing the color and text
fields. Similarly, it was done for the input image text field also in the GUI.
40
Assigning the elements their corresponding axes for making an active
functionality while coding is done by this phase of the creation of axes, which helps in
displaying the images corresponding to a event happening dynamically. Axes is used
create an axes graphics object in the current figure using default property values. It is
done with the help of axes tools present in the toolbox, each axes is named accordingly
so that an event can be handled over it.
We need to create an interactive interface like buttons, so that the client or the user
interacts with the application easily. We have created two buttons for video selection
and image selection. We do this by placing the buttons from the tool box. However an
event should occur when the user clicks the button, this is done with the help of
function calling or call back. This creates a .M file , where the code is getting handled.
Creating a BasicGui.M File
In this step the actual logic is present. The algorithm we have used here is 'Euclidean
distance', which we use to calculate the distance between the query image and the
frames of the video in the comparison space.
Euclidean distance:
In order to calculate the distance between two images to find the similarity and
dissimilarity, the algorithm used is euclidean distance algorithm. In euclidean distance
metric difference of feature of query and database image is squared which increased the
divergence between the query and database image.
Logic of Feature extraction and comparison
For extracting the Features and comparing them with the datasets according to the
following steps as shown,
Step 1: Reading the image input from the GUI
Step 2: After reading the input image into a variable, display on the axes using axes
function
41
Step 3: Run the vl-feat library in the matlab by tracing the current directory path.
Step 4: Initialise the cell Size = 8 with 50 Percent overlapping factor.
Step 5: Since the input image and database image of the frame from the video should be
of same size we need toresize the image into 256 X 256 size
Step 6 :We need to calculate the gradients of the input image. This is done with the help
of Histogram of Oriented Gradients. This returns a matrix containing all the gradient
values and magnitude values, which are the feature descriptor values
Step 7: Reshape the matrix formed in Step 6 into single dimensional array for
comparison
Step 8: Similarly for all the frames in the database from 1 to the the count of total
number of files repeat steps 4 to Step 8
Step 9: Then store the single dimension values of all images into a single dimension
array. Which means each element in the cell array which is single dimensional has
single dimensional values of gradient values.
Step10:Using Euclidean method as stated above, calculate the distance between
database images and input image. In order to make sure that all the distance values are
in a single dimensional array
Step 11:After finding the distances corresponding to each frame in the video and the
current query image, the indexes in the directory and display the images over the
axes.
Step 12:Then from the obtained Euclidean values of all the frames in the database, the
best values of the images are chosen and displayed in the axes. Thus, the execution of
the project is terminated.
42
7.1 SAMPLE CODE:
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton1 (see GCBO)
% eventdata reserved - to be defined in a future version
of MATLAB
% handles structure with handles and user data (see
GUIDATA)
[filename
pathname]=uigetfile({'*.bmp';'*.mp4';'*.avi';},'File
Selector');
str=strcat(pathname,filename);
obj = mmreader(str);
vid = read(obj);
frames = obj.NumberOfFrames;
for x = 1 : frames
imwrite(vid(:,:,:,x),strcat('frame-
',num2str(x),'.jpeg'));
end
%cd('C:UsersanveshDesktopProject');
a=imread('frame-1.jpeg');
axes(handles.axes4);
imshow(a);
b=imread('frame-2.jpeg');
axes(handles.axes5);
imshow(b);
c=imread('frame-3.jpeg');
axes(handles.axes6);
imshow(c);
d=imread('frame-4.jpeg');
axes(handles.axes7);
imshow(d);
e=imread('frame-5.jpeg');
axes(handles.axes8);
imshow(e);
f=imread('frame-6.jpeg');
axes(handles.axes9);
imshow(f);
g=imread('frame-7.jpeg');
axes(handles.axes10);
imshow(g);
h=imread('frame-8.jpeg');
axes(handles.axes11);
imshow(h);
i=imread('frame-9.jpeg');
axes(handles.axes12);
43
imshow(i);
% --- Executes on button press in pushbutton2.
function pushbutton2_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton2 (see GCBO)
% eventdata reserved - to be defined in a fut ure version
of MATLAB
% handles structure with handles and user da ta (see
GUIDATA)
% function for image selection der or not
run('C:UsersanveshDesktop1Projectvlfeat-
0.9.19toolboxvl_setup');
[filename
pathname]=uigetfile({'*.jpeg';'*.png';'*.avi';},'File
Selector');
str=strcat(pathname,filename);
img1=imread(str);
imagefiles=dir('*.jpeg');
img1resize=imresize(img1,[256 256]);
cellsize=8
hog1=vl_hog(single(img1resize),cellsize,'verbose');
n1=numel(hog1);
rehog1=reshape(hog1,[1,n1]);
nfiles=length(imagefiles);
for i=1:1:nfiles
currentfile=imagefiles(i).name;
currentimage=imread(currentfile);
image{i}=imresize(currentimage,[256 256]);
hogca{i}=vl_hog(single(image{i}),cellsize,'verbose');
n=numel(hogca{i});
rehogca{i}=reshape(hogca{i},[1,n]);
dist(i)=pdist2(rehog1,rehogca{i},'euclidean');
end
[sorted,ix]=sort(dist);
firstIndex=ix(1:10);
str1=imagefiles(firstIndex(1)).name;
image1=imread(str1);
image2=imread(str);
imgresize1=imresize(image1,[256 256]);
imgresize2=imresize(image2,[256 256]);
x='found image';
if(imgresize1==imgresize2)
str1=imagefiles(firstIndex(1)).name
disp(x);
else
disp('not found');
end
%Running vl-feat toolbox for handling videos %initially by
breaking down
%into shots and frames
44
for i=1:1:20
run('vlfeat-0.9.19/toolbox/vl_setup');
cellsize=8;
str='frame-';
str2=strcat(str,num2str(i));
str2=strcat(str2,'.jpeg');
img{i}=imread(str2);
imgresize=imresize(img{i},[256 256]);
hog1{i}=vl_hog(single(imgresize),cellsize,'verbose');
n1=numel(hog1{i});
rehog{i}=reshape(hog1{i},[1,n1]);
% input done
imagefiles=dir('*.jpeg');
nfiles=length(imagefiles);
for j=1:1:nfiles
currentimage=imread(imagefiles(j).name);
ir{j}=imresize(currentimage,[256 256]);
hc{j}=vl_hog(single(ir{j}),cellsize,'verbose');
n=numel(hc{j});
rh{j}=reshape(hc{j},[1,n]);
dist{j}=pdist2(rehog{i},rh{j},'euclidean')
end
end
45
7.2 SAMPLE_GUI.M & SIMULATION RESULTS
PROVIDING THE QUERY IMAGE:
46
DISPLAYING THE FRAMES CORRESPONDING TO THE VIDEO.
47
OUTPUT OF THE EXECUTION(features matched):-
48
OUTPUT OF THE EXECUTION (features unmatched):-
49
8.SYSTEM TESTING
8.1 TESTING:
 Testing is a process of executing a program with a intent of finding
an error.
 Testing presents an interesting anomaly for the software engineering.
 Testing is a set of activities that can be planned in advance and
conducted systematically
 Testing is a set of activities that can be planned in advance and
conducted systematically
 Software testing is often referred to as verification & validation.
8.2 TYPES OF TESTING:
The various types of testing are
1)White Box Testing
2)Black Box Testing
3)Alpha Testing
4)Beta Testing
5)Win Runner
6)Load Runner
7)System Testing
8)Unit Testing
9)End to End Testing
The type of testing we have used to measure the accuracy and efficiency of the
retrieval is Black box testing. It is used to check the output depending upon the
input given
WHITE-BOX TESTING
White-box testing, sometimes called glass-box testing, is a test case design
method that uses the control structure of the procedural design to derive test
cases. Using
white-box testing methods, the software engineer can derive test cases that
50
(1) guarantee that all independent paths within a module have been exercised at
least once,
(2) exercise all logical decisions on their true and false sides, (3) execute all loops
at their boundaries and within their operational bounds, and (4) exercise internal
datastructures to ensure their validity.
A reasonable question might be posed at this juncture: "Why spend time and
energy worrying about (and testing) logical minutiae when we might better
expend effort It is not possible to exhaustively test every program path because
the number of paths is simply too large White-box tests can be designed only
after a component-level design (or source code) exists. The logical details of the
program must be available ensuring that program requirements have been met?"
Stated another way, why don't we spend all of our energy on black-box tests?
The answer lies in the nature of software defects
• Logic errors and incorrect assumptions are inversely proportional to the
probability
that a program path will be executed. Errors tend to creep into our work when
we design and implement function, conditions, or control that are out of the
mainstream. Everyday processing tends to be well understood (and well
scrutinized), while "special case" processing tends to fall into the cracks.
• We often believe that a logical path is not likely to be executed when, in fact, it
may be executed on a regular basis. The logical flow of a program is sometimes
counterintuitive, meaning that our unconscious assumptions about flow of
control and data may lead us to make design errors that are uncovered only once
path testing commences.
• Typographical errors are random. When a program is translated into
programming language source code, it is likely that some typing errors will
occur. Many will be uncovered by syntax and type checking mechanisms, but
others may go undetected until testing begins. It is as likely that a typo will exist
on an obscure logical path as on a mainstream path.
BlackBox Testing:
1) Its also called as behavioural testing . It focuses on the functional
requirements of the software.
51
2) It is complementary approach that is likely to uncover a different class of
errors than white box errors.
3) A black box testing enables a software engineering to derive a sets of
input conditions that will fully exercise all functional requirements for a
program can be applied to virtually every level of software testing.
Accuracy and precision are defined in terms of systematic and random errors.
The more common definition associates accuracy with systematic errors and
precision with random errors.
Accuracy is measured by formula
Precision is measure by the formula
ALPHA TESTING:-
The alpha test is conducted at the developer's site by a customer. The software
is used in a natural setting with the developer "looking over the shoulder" of the
user and recording errors and usage problems. Alpha tests are conducted in a
controlled environment.
BETA TESTING:-
The beta test is conducted at one or more customer sites by the end-user of the
software. Unlike alpha testing, the developer is generally not present. Therefore,
the beta test is a "live" application of the software in an environment that cannot
be controlled by the developer. The customer records all problems (real or
imagined) that
are encountered during beta testing and reports these to the developer at regular
intervals. As a result of problems reported during beta tests, software engineers
52
make modifications and then prepare for release of the software product to the
entire customer base.
SYSTEM TESTING:-
System testing is actually a series of different tests whose primary purpose is to
fully Exercise the computer-based system. Although each test has a different
purpose, all Work to verify that system elements have been properly integrated
and perform allocated functions. In the sections that follow, we discuss the
types of system tests that are worthwhile for software-based systems.
UNIT TESTING:-
Unit testing focuses verification effort on the smallest unit of software design—
the software component or module. Using the component-level design
description as a guide, important control paths are tested to uncover errors
within the boundary of the module. The relative complexity of tests and
uncovered errors is limited by the constrained scope established for unit testing.
The unit test is white-box oriented, and the step can be conducted in parallel for
multiple components.
8.3 TestCases / Sample Cases
1) Input Image - epic.jpeg
output from the module - frame-1.jpeg match found
Precision = 1/1+0 * 100 =100 %
2) Input Image - epic1.jpeg
output from the module - frame -57.jpeg match found
Precision = 1/1+0 * 100 =100 %
3) Input image - 802.png
output from the module - match not found
Precision = 0/0+1 * 100 =0%
4) Input image - 1710.png
output from the module - frame 131.jpeg match found
Precision = 1/1+0 * 100 =100 %
accuracy = 3 + 1 / 3+0+0+1 = 4/4 =100%
53
9. CONCLUSION
In this project, the an efficient method for VIDEO SHOT BOUNDARY
DETECTION implementation in any video format is done basing on Histogram of
oriented gradients. It has showed that our schema has the most efficiency when
compared to the existing feature extractors. This project has clearly demonstrated the
retrieval of the necessary information from different formats video content is done
efficiently. Therefore the solution can be treated as a new candidate for Retrieval
system.
As part of future work, extend of work to explore and design more effective
retrieval we can use mutation techniques. However First, this project found that the
performance and efficiency of SBD, especially for all kinds of image and video
formats and under a huge dataset conditions.
54
10. REFERENCES
 A survey based on Video Shot Boundary Detection techniques by Nikita Sao1 , Ravi
Mishra2 ME Scholar, Department of ET&T, SSCET, Bhilai, India.International
Journal of Advanced Research in Computer and Communication Engineering Vol. 3,
Issue 4, April 2014
 A Survey on Visual Content-Based Video Indexing and Retrieval Weiming Hu,
Senior Member, IEEE, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen
MaybankIEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—
PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 6, NOVEMBER 2011
 Shot Boundary Detection Using Shifting of Image Frame Rahul Kumar Garg1 ,
Gaurav Saxena2 International Journal of Scientific Engineering and Technology
(ISSN : 2277-1581) Volume No.3 Issue No.6, pp : 785-788 June-2014
 International Journal of Scientific Engineering and Technology (ISSN : 2277-1581)
Volume No.3 Issue No.6, pp : 785-788
 Video Shot Detection Techniques Brief Overview Mohini Deokar, Ruhi Kabra
International Journal of Engineering Research and General Science Volume 2, Issue
6, October-November, 2014 ISSN 2091-2730
 http://en.wikipedia.org/wiki/Shot_transition_detection
55
APPENDIX-I
In order to retrieve the histogram of oriented gradient features we need
a feature extractor code to exhibit the functionality. This is done with the help of C
Language, where we use MEX libraries to define the functionality in the C code and
use the functions in Matlab
HOG.c
/** @file hog.c
** @brief Histogram of Oriented Gradients (HOG) -
Definition
**/
/*
Copyright (C) 2014 Anvesh.
All rights reserved.
This file is part of the VLFeat library and is made
available under
the terms of the BSD license (see the COPYING file).
*/
#include "hog.h"
#include "mathop.h"
#include <string.h>
/**
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
@page hog Histogram of Oriented Gradients (HOG)
features
@author Anvesh
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
@ref hog.h implements the Histogram of Oriented
Gradients (HOG) features
in the variants of Dalal Triggs
@cite{dalal05histograms} and of UOCTTI
@cite{felzenszwalb09object}. Applications include
object detection
and deformable object detection.
- @ref hog-overview
- @ref hog-tech
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
@section hog-overview Overview
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56
~~~~~~~ -->
HOG is a standard image feature used, among others,
in object detection
and deformable object detection. It decomposes the
image into square cells
of a given size (typically eight pixels), compute a
histogram of oriented
gradient in each cell (similar to @ref sift), and
then renormalizes
the cells by looking into adjacent blocks.
VLFeat implements two HOG variants: the original one
of Dalal-Triggs
@cite{dalal05histograms} and the one proposed in
Felzenszwalb et al.
@cite{felzenszwalb09object}.
In order to use HOG, start by creating a new HOG
object, set the desired
parameters, pass a (color or grayscale) image, and
read off the results.
@code
VlHog * hog = vl_hog_new(VlHogVariantDalalTriggs,
numOrientations, VL_FALSE) ;
vl_hog_put_image(hog, image, height, width,
numChannels, cellSize) ;
hogWidth = vl_hog_get_width(hog) ;
hogHeight = vl_hog_get_height(hog) ;
hogDimenison = vl_hog_get_dimension(hog) ;
hogArray =
vl_malloc(hogWidth*hogHeight*hogDimension*sizeof(floa
t)) ;
vl_hog_extract(hog, hogArray) ;
vl_hog_delete(hog) ;
@endcode
HOG is a feature array of the dimension returned
by ::vl_hog_get_width,
::vl_hog_get_height, with each feature (histogram)
having
dimension ::vl_hog_get_dimension. The array is stored
in row major order,
with the slowest varying dimension beying the
dimension indexing the histogram
elements.
The number of entreis in the histogram as well as
their meaning depends
on the HOG variant and is detailed later. However, it
is usually
57
unnecessary to know such details. @ref hog.h provides
support for
creating an inconic representation of a HOG feature
array:
@code
glyphSize = vl_hog_get_glyph_size(hog) ;
imageHeight = glyphSize * hogArrayHeight ;
imageWidth = glyphSize * hogArrayWidth ;
image =
vl_malloc(sizeof(float)*imageWidth*imageHeight) ;
vl_hog_render(hog, image, hogArray) ;
@endcode
It is often convenient to mirror HOG features from
left to right. This
can be obtained by mirroring an array of HOG cells,
but the content
of each cell must also be rearranged. This can be
done by
the permutation obtaiend by ::vl_hog_get_permutation.
Furthermore, @ref hog.h suppots computing HOG
features not from
images but from vector fields.
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
@section hog-tech Technical details
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
HOG divdes the input image into square cells of size
@c cellSize,
fitting as many cells as possible, filling the image
domain from
the upper-left corner down to the right one. For each
row and column,
the last cell is at least half contained in the
image.
More precisely, the number of cells obtained in this
manner is:
@code
hogWidth = (width + cellSize/2) / cellSize ;
hogHeight = (height + cellSize/2) / cellSize ;
@endcode
Then the image gradient @f$ nabla ell(x,y) @f$
is computed by using central difference (for colour
image
58
the channel with the largest gradient at that pixel
is used).
The gradient @f$ nabla ell(x,y) @f$ is assigned to
one of @c 2*numOrientations orientation in the
range @f$ [0,2pi) @f$ (see @ref hog-conventions for
details).
Contributions are then accumulated by using bilinear
interpolation
to four neigbhour cells, as in @ref sift.
This results in an histogram @f$h_d@f$ of dimension
2*numOrientations, called of @e directed orientations
since it accounts for the direction as well as the
orientation
of the gradient. A second histogram @f$h_u@f$ of
undirected orientations
of half the size is obtained by folding @f$ h_d @f$
into two.
Let a block of cell be a @f$ 2times 2 @f$ sub-array
of cells.
Let the norm of a block be the @f$ l^2 @f$ norm of
the stacking of the
respective unoriented histogram. Given a HOG cell,
four normalisation
factors are then obtained as the inverse of the norm
of the four
blocks that contain the cell.
For the Dalal-Triggs variant, each histogram @f$ h_d
@f$ is copied
four times, normalised using the four different
normalisation factors,
the four vectors are stacked, saturated at 0.2, and
finally stored as the descriptor
of the cell. This results in a @c numOrientations * 4
dimensional
cell descriptor. Blocks are visited from left to
right and top to bottom
when forming the final descriptor.
For the UOCCTI descriptor, the same is done for both
the undirected
as well as the directed orientation histograms. This
would yield
a dimension of @c 4*(2+1)*numOrientations elements,
but the resulting
vector is projected down to @c (2+1)*numOrientations
elements
by averaging corresponding histogram dimensions. This
59
was shown to
be an algebraic approximation of PCA for descriptors
computed on natural
images.
In addition, for the UOCTTI variant the l1 norm of
each of the
four l2 normalised undirected histograms is computed
and stored
as additional four dimensions, for a total of
@c 4+3*numOrientations dimensions.
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
@subsection hog-conventions Conventions
<!--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~ -->
The orientation of a gradient is expressed as the
angle it forms with the
horizontal axis of the image. Angles are measured
clock-wise (as the vertical
image axis points downards), and the null angle
corresponds to
an horizontal vector pointing right. The quantized
directed
orientations are @f$ mathrm{k} pi /
mathrm{numOrientations} @f$, where
@c k is an index that varies in the ingeger
range @f$ {0, dots, 2mathrm{numOrientations} - 1}
@f$.
Note that the orientations capture the orientation of
the gradeint;
image edges would be oriented at 90 degrees from
these.
**/
/*
-----------------------------------------------------
----------- */
/** @brief Create a new HOG object
** @param variant HOG descriptor variant.
** @param numOrientations number of distinguished
orientations.
** @param transposed wether images are transposed
(column major).
** @return the new HOG object.
**
60
** The function creates a new HOG object to extract
descriptors of
** the prescribed @c variant. The angular resolution
is set by
** @a numOrientations, which specifies the number of
<em>undirected</em>
** orientations. The object can work with column
major images
** by setting @a transposed to true.
**/
VlHog *
vl_hog_new (VlHogVariant variant, vl_size
numOrientations, vl_bool transposed)
{
vl_index o, k ;
VlHog * self = vl_calloc(1, sizeof(VlHog)) ;
assert(numOrientations >= 1) ;
self->variant = variant ;
self->numOrientations = numOrientations ;
self->glyphSize = 21 ;
self->transposed = transposed ;
self->useBilinearOrientationAssigment = VL_FALSE ;
self->orientationX = vl_malloc(sizeof(float) *
self->numOrientations) ;
self->orientationY = vl_malloc(sizeof(float) *
self->numOrientations) ;
/*
Create a vector along the center of each
orientation bin. These
are used to map gradients to bins. If the image is
transposed,
then this can be adjusted here by swapping X and Y
in these
vectors.
*/
for(o = 0 ; o < (signed)self->numOrientations ; +
+o) {
double angle = o * VL_PI / self-
>numOrientations ;
if (!self->transposed) {
self->orientationX[o] = (float) cos(angle) ;
self->orientationY[o] = (float) sin(angle) ;
} else {
self->orientationX[o] = (float) sin(angle) ;
self->orientationY[o] = (float) cos(angle) ;
}
61
}
/*
If the number of orientation is equal to 9, one
gets:
Uoccti:: 18 directed orientations + 9 undirected
orientations + 4 texture
DalalTriggs:: 9 undirected orientations x 4
blocks.
*/
switch (self->variant) {
case VlHogVariantUoctti:
self->dimension = 3*self->numOrientations + 4 ;
break ;
case VlHogVariantDalalTriggs:
self->dimension = 4*self->numOrientations ;
break ;
default:
assert(0) ;
}
/*
A permutation specifies how to permute elements in
a HOG
descriptor to flip it horizontally. Since the
first orientation
of index 0 points to the right, this must be
swapped with orientation
self->numOrientation that points to the left (for
the directed case,
and to itself for the undirected one).
*/
self->permutation = vl_malloc(self->dimension *
sizeof(vl_index)) ;
switch (self->variant) {
case VlHogVariantUoctti:
for(o = 0 ; o < (signed)self->numOrientations ;
++o) {
vl_index op = self->numOrientations - o ;
self->permutation[o] = op ;
self->permutation[o + self->numOrientations]
= (op + self->numOrientations) % (2*self-
>numOrientations) ;
self->permutation[o + 2*self-
>numOrientations] = (op % self->numOrientations) +
2*self->numOrientations ;
}
for (k = 0 ; k < 4 ; ++k) {
62
/* The texture features correspond to four
displaced block around
a cell. These permute with a lr flip as for
DalalTriggs. */
vl_index blockx = k % 2 ;
vl_index blocky = k / 2 ;
vl_index q = (1 - blockx) + blocky * 2 ;
self->permutation[k + self->numOrientations *
3] = q + self->numOrientations * 3 ;
}
break ;
case VlHogVariantDalalTriggs:
for(k = 0 ; k < 4 ; ++k) {
/* Find the corresponding block. Blocks are
listed in order 1,2,3,4,...
from left to right and top to bottom */
vl_index blockx = k % 2 ;
vl_index blocky = k / 2 ;
vl_index q = (1 - blockx) + blocky * 2 ;
for(o = 0 ; o < (signed)self->numOrientations
; ++o) {
vl_index op = self->numOrientations - o ;
self->permutation[o + k*self-
>numOrientations] = (op % self->numOrientations) +
q*self->numOrientations ;
}
}
break ;
default:
assert(0) ;
}
/*
Create glyphs for representing the HOG features/
filters. The glyphs
are simple bars, oriented orthogonally to the
gradients to represent
image edges. If the object is configured to work
on transposed image,
the glyphs images are also stored in column-major.
*/
self->glyphs = vl_calloc(self->glyphSize * self-
>glyphSize * self->numOrientations, sizeof(float)) ;
_61
#define atglyph(x,y,k) self->glyphs[(x) + self-
>glyphSize * (y) + self->glyphSize * self->glyphSize
* (k)]
63
for (o = 0 ; o < (signed)self->numOrientations ; +
+o) {
double angle = fmod(o * VL_PI / self-
>numOrientations + VL_PI/2, VL_PI) ;
double x2 = self->glyphSize * cos(angle) / 2 ;
double y2 = self->glyphSize * sin(angle) / 2 ;
if (angle <= VL_PI / 4 || angle >= VL_PI * 3 / 4)
{
/* along horizontal direction */
double slope = y2 / x2 ;
double offset = (1 - slope) * (self->glyphSize
- 1) / 2 ;
vl_index skip = (1 - fabs(cos(angle))) / 2 *
self->glyphSize ;
vl_index i, j ;
for (i = skip ; i < (signed)self->glyphSize -
skip ; ++i) {
j = vl_round_d(slope * i + offset) ;
if (! self->transposed) {
atglyph(i,j,o) = 1 ;
} else {
atglyph(j,i,o) = 1 ;
}
}
} else {
/* along vertical direction */
double slope = x2 / y2 ;
double offset = (1 - slope) * (self->glyphSize
- 1) / 2 ;
vl_index skip = (1 - sin(angle)) / 2 * self-
>glyphSize ;
vl_index i, j ;
for (j = skip ; j < (signed)self->glyphSize -
skip; ++j) {
i = vl_round_d(slope * j + offset) ;
if (! self->transposed) {
atglyph(i,j,o) = 1 ;
} else {
atglyph(j,i,o) = 1 ;
}
}
}
}
return self ;
}
/*
64
-----------------------------------------------------
----------- */
/** @brief Delete a HOG object
** @param self HOG object to delete.
**/
void
vl_hog_delete (VlHog * self)
{
if (self->orientationX) {
vl_free(self->orientationX) ;
self->orientationX = NULL ;
}
if (self->orientationY) {
vl_free(self->orientationY) ;
self->orientationY = NULL ;
}
if (self->glyphs) {
vl_free(self->glyphs) ;
self->glyphs = NULL ;
}
if (self->permutation) {
vl_free(self->permutation) ;
self->permutation = NULL ;
}
if (self->hog) {
vl_free(self->hog) ;
self->hog = NULL ;
}
if (self->hogNorm) {
vl_free(self->hogNorm) ;
self->hogNorm = NULL ;
}
vl_free(self) ;
}
/*
-----------------------------------------------------
----------- */
/** @brief Get HOG glyph size
** @param self HOG object.
** @return size (height and width) of a glyph.
**/
vl_size
vl_hog_get_glyph_size (VlHog const * self)
{
return self->glyphSize ;
}
65
/*
-----------------------------------------------------
----------- */
/** @brief Get HOG left-right flip permutation
** @param self HOG object.
** @return left-right permutation.
**
** The function returns a pointer to an array @c
permutation of ::vl_hog_get_dimension
** elements. Given a HOG descriptor (for a cell) @c
hog, which is also
** a vector of ::vl_hog_get_dimension elements, the
** descriptor obtained for the same image flipped
horizotnally is
** given by <code>flippedHog[i] =
hog[permutation[i]]</code>.
**/
vl_index const *
vl_hog_get_permutation (VlHog const * self)
{
return self->permutation ;
}
/*
-----------------------------------------------------
----------- */
/** @brief Turn bilinear interpolation of assignments
on or off
** @param self HOG object.
** @param x @c true if orientations should be
assigned with bilinear interpolation.
**/
void
vl_hog_set_use_bilinear_orientation_assignments
(VlHog * self, vl_bool x) {
self->useBilinearOrientationAssigment = x ;
}
/** @brief Tell whether assignments use bilinear
interpolation or not
** @param self HOG object.
** @return @c true if orientations are be assigned
with bilinear interpolation.
**/
vl_bool
vl_hog_get_use_bilinear_orientation_assignments
(VlHog const * self) {
return self->useBilinearOrientationAssigment ;
66
}
/*
-----------------------------------------------------
----------- */
/** @brief Render a HOG descriptor to a glyph image
** @param self HOG object.
** @param image glyph image (output).
** @param descriptor HOG descriptor.
** @param width HOG descriptor width.
** @param height HOG descriptor height.
**
** The function renders the HOG descriptor or filter
** @a descriptor as an image (for visualization) and
stores the result in
** the buffer @a image. This buffer
** must be an array of dimensions @c width*glyphSize
** by @c height*glyphSize elements, where @c
glyphSize is
** obtained from ::vl_hog_get_glyph_size and is the
size in pixels
** of the image element used to represent the
descriptor of one
** HOG cell.
**/
void
vl_hog_render (VlHog const * self,
float * image,
float const * descriptor,
vl_size width,
vl_size height)
{
vl_index x, y, k, cx, cy ;
vl_size hogStride = width * height ;
assert(self) ;
assert(image) ;
assert(descriptor) ;
assert(width > 0) ;
assert(height > 0) ;
for (y = 0 ; y < (signed)height ; ++y) {
for (x = 0 ; x < (signed)width ; ++x) {
float minWeight = 0 ;
float maxWeight = 0 ;
for (k = 0 ; k < (signed)self-
>numOrientations ; ++k) {
float weight ;
float const * glyph = self->glyphs + k *
67
(self->glyphSize*self->glyphSize) ;
float * glyphImage = image + self->glyphSize
* x + y * width * (self->glyphSize*self->glyphSize) ;
switch (self->variant) {
case VlHogVariantUoctti:
weight =
descriptor[k * hogStride] +
descriptor[(k + self->numOrientations) *
hogStride] +
descriptor[(k + 2 * self-
>numOrientations) * hogStride] ;
break ;
case VlHogVariantDalalTriggs:
weight =
descriptor[k * hogStride] +
descriptor[(k + self->numOrientations) *
hogStride] +
descriptor[(k + 2 * self-
>numOrientations) * hogStride] +
descriptor[(k + 3 * self-
>numOrientations) * hogStride] ;
break ;
default:
abort() ;
}
maxWeight = VL_MAX(weight, maxWeight) ;
minWeight = VL_MIN(weight, minWeight);
for (cy = 0 ; cy < (signed)self->glyphSize ;
++cy) {
for (cx = 0 ; cx < (signed)self-
>glyphSize ; ++cx) {
*glyphImage++ += weight * (*glyph++) ;
}
glyphImage += (width - 1) * self->glyphSize
;
}
} /* next orientation */
{
float * glyphImage = image + self->glyphSize
* x + y * width * (self->glyphSize*self->glyphSize) ;
for (cy = 0 ; cy < (signed)self->glyphSize ;
++cy) {
for (cx = 0 ; cx < (signed)self-
>glyphSize ; ++cx) {
float value = *glyphImage ;
*glyphImage++ = VL_MAX(minWeight,
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject
AcademicProject

Mais conteúdo relacionado

Mais procurados

Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style
Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style
Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style IJECEIAES
 
Comparative Study on Watermarking & Image Encryption for Secure Communication
Comparative Study on Watermarking & Image Encryption for Secure CommunicationComparative Study on Watermarking & Image Encryption for Secure Communication
Comparative Study on Watermarking & Image Encryption for Secure CommunicationIJTET Journal
 
Secured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWTSecured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWTEditor IJMTER
 
Unsupervised object-level video summarization with online motion auto-encoder
Unsupervised object-level video summarization with online motion auto-encoderUnsupervised object-level video summarization with online motion auto-encoder
Unsupervised object-level video summarization with online motion auto-encoderNEERAJ BAGHEL
 
International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...
International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...
International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...CSCJournals
 
Key frame extraction methodology for video annotation
Key frame extraction methodology for video annotationKey frame extraction methodology for video annotation
Key frame extraction methodology for video annotationIAEME Publication
 
Enhancing Security of Multimodal Biometric Authentication System by Implement...
Enhancing Security of Multimodal Biometric Authentication System by Implement...Enhancing Security of Multimodal Biometric Authentication System by Implement...
Enhancing Security of Multimodal Biometric Authentication System by Implement...IOSR Journals
 
Reversible De-Identification for Lossless Image Compression using Reversible ...
Reversible De-Identification for Lossless Image Compression using Reversible ...Reversible De-Identification for Lossless Image Compression using Reversible ...
Reversible De-Identification for Lossless Image Compression using Reversible ...john236zaq
 
Ac02417471753
Ac02417471753Ac02417471753
Ac02417471753IJMER
 
Multilayer bit allocation for video encoding
Multilayer bit allocation for video encodingMultilayer bit allocation for video encoding
Multilayer bit allocation for video encodingIJMIT JOURNAL
 
The Role of Semantic Web Technologies in Smart Environments
The Role of Semantic Web Technologies in Smart EnvironmentsThe Role of Semantic Web Technologies in Smart Environments
The Role of Semantic Web Technologies in Smart EnvironmentsFaisal Razzak
 
Information technology
Information technologyInformation technology
Information technologybikram ...
 
Efficient and Robust Detection of Duplicate Videos in a Database
Efficient and Robust Detection of Duplicate Videos in a DatabaseEfficient and Robust Detection of Duplicate Videos in a Database
Efficient and Robust Detection of Duplicate Videos in a Databaserahulmonikasharma
 
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET-  	  Identification of Missing Person in the Crowd using Pretrained Neu...IRJET-  	  Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...IRJET Journal
 

Mais procurados (17)

Hv2615441548
Hv2615441548Hv2615441548
Hv2615441548
 
Ki2417591763
Ki2417591763Ki2417591763
Ki2417591763
 
Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style
Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style
Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style
 
Comparative Study on Watermarking & Image Encryption for Secure Communication
Comparative Study on Watermarking & Image Encryption for Secure CommunicationComparative Study on Watermarking & Image Encryption for Secure Communication
Comparative Study on Watermarking & Image Encryption for Secure Communication
 
Secured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWTSecured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWT
 
Unsupervised object-level video summarization with online motion auto-encoder
Unsupervised object-level video summarization with online motion auto-encoderUnsupervised object-level video summarization with online motion auto-encoder
Unsupervised object-level video summarization with online motion auto-encoder
 
International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...
International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...
International Journal of Computer Science and Security (IJCSS) Volume (3) Iss...
 
Key frame extraction methodology for video annotation
Key frame extraction methodology for video annotationKey frame extraction methodology for video annotation
Key frame extraction methodology for video annotation
 
Enhancing Security of Multimodal Biometric Authentication System by Implement...
Enhancing Security of Multimodal Biometric Authentication System by Implement...Enhancing Security of Multimodal Biometric Authentication System by Implement...
Enhancing Security of Multimodal Biometric Authentication System by Implement...
 
Reversible De-Identification for Lossless Image Compression using Reversible ...
Reversible De-Identification for Lossless Image Compression using Reversible ...Reversible De-Identification for Lossless Image Compression using Reversible ...
Reversible De-Identification for Lossless Image Compression using Reversible ...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Ac02417471753
Ac02417471753Ac02417471753
Ac02417471753
 
Multilayer bit allocation for video encoding
Multilayer bit allocation for video encodingMultilayer bit allocation for video encoding
Multilayer bit allocation for video encoding
 
The Role of Semantic Web Technologies in Smart Environments
The Role of Semantic Web Technologies in Smart EnvironmentsThe Role of Semantic Web Technologies in Smart Environments
The Role of Semantic Web Technologies in Smart Environments
 
Information technology
Information technologyInformation technology
Information technology
 
Efficient and Robust Detection of Duplicate Videos in a Database
Efficient and Robust Detection of Duplicate Videos in a DatabaseEfficient and Robust Detection of Duplicate Videos in a Database
Efficient and Robust Detection of Duplicate Videos in a Database
 
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET-  	  Identification of Missing Person in the Crowd using Pretrained Neu...IRJET-  	  Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
 

Semelhante a AcademicProject

Efficient video indexing for fast motion video
Efficient video indexing for fast motion videoEfficient video indexing for fast motion video
Efficient video indexing for fast motion videoijcga
 
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...ijtsrd
 
Video Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyVideo Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyIRJET Journal
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonVideo Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...Journal For Research
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationIRJET Journal
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
 
System analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsSystem analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsijma
 
A Novel Method for Sensing Obscene Videos using Scene Change Detection
A Novel Method for Sensing Obscene Videos using Scene Change DetectionA Novel Method for Sensing Obscene Videos using Scene Change Detection
A Novel Method for Sensing Obscene Videos using Scene Change DetectionRadita Apriana
 
Recent advances in content based video copy detection (IEEE)
Recent advances in content based video copy detection (IEEE)Recent advances in content based video copy detection (IEEE)
Recent advances in content based video copy detection (IEEE)PACE 2.0
 
Video indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksVideo indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksIAEME Publication
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS
 
An Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection StrategiesAn Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection Strategiesidescitation
 
18 17 jan17 13470 rakesh ahuja revised-version(edit)
18 17 jan17 13470 rakesh ahuja revised-version(edit)18 17 jan17 13470 rakesh ahuja revised-version(edit)
18 17 jan17 13470 rakesh ahuja revised-version(edit)IAESIJEECS
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A SurveyIRJET Journal
 

Semelhante a AcademicProject (20)

Efficient video indexing for fast motion video
Efficient video indexing for fast motion videoEfficient video indexing for fast motion video
Efficient video indexing for fast motion video
 
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
 
Video Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyVideo Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: Survey
 
Sub1577
Sub1577Sub1577
Sub1577
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonVideo Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
 
1829 1833
1829 18331829 1833
1829 1833
 
1829 1833
1829 18331829 1833
1829 1833
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
System analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsSystem analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systems
 
A Novel Method for Sensing Obscene Videos using Scene Change Detection
A Novel Method for Sensing Obscene Videos using Scene Change DetectionA Novel Method for Sensing Obscene Videos using Scene Change Detection
A Novel Method for Sensing Obscene Videos using Scene Change Detection
 
Recent advances in content based video copy detection (IEEE)
Recent advances in content based video copy detection (IEEE)Recent advances in content based video copy detection (IEEE)
Recent advances in content based video copy detection (IEEE)
 
Video indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksVideo indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracks
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
 
An Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection StrategiesAn Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection Strategies
 
18 17 jan17 13470 rakesh ahuja revised-version(edit)
18 17 jan17 13470 rakesh ahuja revised-version(edit)18 17 jan17 13470 rakesh ahuja revised-version(edit)
18 17 jan17 13470 rakesh ahuja revised-version(edit)
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A Survey
 
Scene change detection
Scene change detection Scene change detection
Scene change detection
 

AcademicProject

  • 1. VIDEO SHOT BOUNDARY DETECTION B.Tech Project Report BY Anveshkumar Kolluri(1210711204) DEPARTMENT OF INFORMATION TECHNOLOGY GITAM INSTITUTE OF TECHNOLOGY GITAM UNIVERSITY VISAKHAPATNAM 530045 ,AP(INDIA) April, 2015
  • 2. CERTIFICATE I hereby certify that the work which is being presented in the B.Tech Major Project Report entitled “VIDEO SHOT BOUNDARY DETECTION”, in partial fulfilment of the requirements of award of the Bachelor of Technology in INFORMATION TECHNOLOGY and submitted to the Department of Information Technology of GITAM Institute Of Technology, GITAM University, Visakhapatnam, A.P is an academic record of my own work carried out during a period from September 2014 to March 2015 under the supervision of Sri D.Kishore Kumar, Assistant Professor, IT Department. The matter presented in this thesis has not been submitted by me for the award of any other degree elsewhere. Signature of Candidate Anveshkumar Kolluri Rollno: 1210711204 This is to certify that the above statement made by the candidate is correct to the best of my knowledge. Signature of Supervisor Date: Sri D.KishoreKumar, Assistant Professor, Project Supervisor
  • 3. ACKNOWLEDGEMENT We acknowledge the efforts of our guide Sri D.Kishore Kumar for his very able guidance and immense help during the thick and thin of this project. His inspiration and perfective criticism was extremely helpful in the successful working of this project. We were continuously assessed and recorded on every module under the guidance of Dr.GVS.Rajkumar and Mr.A.Naresh. There support has been immeasurable. We acknowledge the help and superlative presence of the Head of Department of Information Technology, Dr. P V Lakshmi for the opportunities she has given us not only during the project but throughout the course of B.Tech. I am greatly indebted to Dr. K. Lakshmi Prasad, principal, GITAM Institute of Technology, for providing facilities to carry out this work. We also duly acknowledge all the faculty members of Information Technology department for guiding us in the making of this project and for solving our problems whenever they surfaced and also for catalyzing our thinking when it seemed to stagnate. Anvesh Kumar Kolluri (1210711204)
  • 4. ABSTRACT Indexing of digital videos in order to support browsing and retrieval by users is a great challenge being faced today. Hence, to design a system that can accurately and automatically process large amounts of heterogeneous videos, their segmentation into shots and scene forms the basic operation, and for this operation first shot boundaries have to be detected. Due to the enormous increase in videos and the database sizes, as well as its vast deployment in various applications, the need for a good retrieval application development arose. This project aims to introduce an efficient and useful video processing and retrieval technique. This report aims to present a detailed analysis of HOG(Histogram of Oriented Analysis) in order to process the videos and retrieve relevant information. This method utilizes the benefits of edge detection together with pixel intensity based technique making a transformation in the size of pixel without much loss in their properties and searching the relevant information. Moreover a new concept involving the HOG method has been successfully applied to the shot boundary detection problem. CONTENTS
  • 5. CHAPTER NAME PAGE NO 1 INTRODUCTION 1.1 MOTIVATION 1.2 FUNDAMENTALS OF SBD 1.2.1 THRESHOLDS 1.2.2 OBJECT OR CAMERA MOTION 1.2.3 FLASH LIGHTS 1.2.4 COMPLEXITY OF DETECTOR 1.3 PROBLEM FORMULATION 1 3 3 3 4 4 4 2 LITERATURE SURVEY 2.1 GRADIENTFIELD DESCRIPTOR FOR IMAGERETRIEVAL 2.2 FEATURE EXTRACTORS 2.2.1 EDGE HISTOGRAM DESCRIPTOR 2.2.2 SCALE INVARIANT FEATURE TRANSFORM 2.2.3 HISTOGRAM OF ORIENTED GRADIENTS 6 10 10 11 12 13 3 SYSTEM DESIGN 3.1 PROBLEM STATEMENT 3.2 EXISTING SYSTEMS 3.3 PROBLEM MOTIVATION 3.4 PROBLEM SOLUTION 3.5 SYSTEM REQUIREMENT ANALYSIS 3.6 SYSTEM REQUIREMENTS 15 15 16 16 17 18 4. MODULES 4.1 GUI MODULE 4.2 QUERY MODULE 4.3 FEATURE EXTRACTION MODULE 4.4 IMAGE RETRIEVAL MODULE 20 20 22 23 5. SELECTED SOFTWARE 5.1 MATLAB AND ITS TOOLBOX 24
  • 6. 5.2 FEATURES 5.2.1 COLOR 5.2.2 TEXTURE 5.2.3 SHAPE 30 30 32 36 6 .SYSTEM DESIGN 6.1 INTRODUTION 6.2 DATA FLOW DIAGRAM 38 38 7. IMPLEMENTATION 7.1 SAMPLE CODE 7.2 SAMPLE_GUI.M & SIMULATION RESULTS 42 45 8. SYSTEM TESTING 8.1 TESTING 8.2 TYPES OF TESTING 8.3 TEST CASES 49 49 52 9. CONCLUSIONS 53 10.REFERENCES APPENDIX-I APPENDIX-II 54 55 87
  • 7. List of Abbrevations SBD Shot Boundary Detection SIFT Scale Invariant Feature Transform EHD Edge Histogram Descriptor HOG Histogram of Oriented Gradients LT Local Threshold GT Global Threshold
  • 8. List of Figures Figure 1 Sequence of frames 1 Figure 2 Types of histogram descriptors 11 Figure 3 Definition of images and sub-image blocks 12 Figure 4 Hog blocks 14 Figure 5 An image and its histogram 31 Figure 6 Texture properties 32 Figure 7 Classical co-occurence matrix 34 Figure 8 Boundary-based & region-based 34 Figure 9 Data flow diagram of video shot boundary detection 36
  • 9. 1 INTRODUCTION 1.1 Motivation In this digital era, the recent developments in video compression technology, widespread use of digital cameras, high capacity digital systems, along with significant increase in computer performance have increased the usage and availability of digital video, creating a need for tools that effectively categorize, search and retrieve the relevant video material. Also, the increasing availability and use of on-line videos has led to a demand for efficient and automated video analysis techniques. Generally, management of such large collections of videos requires knowledge of the content of those videos. Therefore, digital video data is processed with the objective of extracting the information about the content conveyed in the video. Now, the definition of content is highly application dependent but there are a number of commonalities in the application of content analysis. Among others, shot boundary detection (SBD), also called temporal video segmentation is one of the important and the most basic aspects of content based video retrieval. Hence, much research has been focused on segmenting video by detecting the boundaries between camera shots. We make use of HOG in order to extract the necessary features across the shot boundary. A shot may be defined as a sequence of frames captured by a single camera in a single continuous action in time and space. FIGURE 1: SEQUENCE OF FRAMES For example, a video sequence showing two people having a conversation may be
  • 10. 2 composed of several close-up shots of their faces which are interleaved and make up a scene. Shots define the low-level, syntactical building blocks of a video sequence. A large number of different types of boundaries can exist between shots. These can be broadly classified into abrupt changes or gradual changes. An abrupt transition is basically a hard cut that occurs between two adjacent frames. In gradual transitions we have fade, which is a gradual change in brightness, either starting or ending with a black frame. Another type of gradual transition is dissolve which is similar to a fade except that it occurs between two shots. The images of the first shot get dimmer and those of the second shot get brighter until the second replaces the first. Other types of shot transitions include wipes (gradual) and computer generated effects such as morphing. A scene is a logical grouping of shots into a semantic unit. A single scene focuses on a certain object or objects of interest, but the shots constituting a scene can be from different angles. In the example above the sequence of shots showing the conversation would comprise one logical scene with the focus being the two people and their conversation. Scene boundary detection requires high level semantic understanding of the video sequence and such an understanding must take cues such as the associated audio track and the encoded data stream itself. Although, video segmentation is far more desirable than simple shot boundary detection since people generally visualize a video as a sequence of scenes not of shots, but shot boundary detection still plays a vital role in any video segmentation system, as it provides the basic syntactic units for higher level processes to build upon. As part of an ongoing video indexing and browsing project, our recent research has focused on the application of different methods of video segmentation to a large and diverse digital video collection. The aim is to examine how different segmentation methods perform on video. With this information, it is hoped to develop a system capable of accurately segmenting a wide range of videos.
  • 11. 3 1.2 Fundamental Problems of SBD The problem of shot boundary detection has been studied for more than a decade and many published methods of detecting shot boundaries exist, several challenges still exist which have been summarized in the upcoming sections: 1.2.1 Thresholds Setting a threshold is one of the most challenging tasks for the correct detection of a shot boundary. To decide whether a shot boundary has occurred, it is necessary to set a threshold, or thresholds for measuring the similarities or dissimilarities between start and end frame of a shot boundary. An abrupt transition has a high discontinuity in adjacent frames, whereas the gradual transition occurs over a number of shots. So the start frame and end frame for an abrupt transition are adjacent frames but this will not be the case for gradual transitions. This makes deciding a threshold more challenging, for gradual transitions. Cosine dissimilarity values between the start and end frames if above this threshold are logged as real shot boundaries, while values below this threshold are ignored.To accurately segment broadcast video, it is necessary to balance the following two - apparently conflicting points: 1 The need to prevent detection of false shot boundaries i.e. detecting boundaries where none exist, by setting a sufficiently high threshold level so as to insulate the detector from noise. 2 The need to detect subtle shot transitions such as dissolves, by making the detector sensitive enough to recognise gradual change. 1.2.2 Object or Camera Motion Visual content of the video changes significantly with the extreme object or camera motion and screenplay effects (e.g. one turns on the light in a dark room) very similar to the typical shot changes. Sometimes, slow motion cause content change similar to gradual transitions, whereas extremely fast camera/object movements cause content change similar to hard cuts. Therefore, it is difficult to
  • 12. 4 differentiate shot changes from the object or camera motion. 1.2.3 Flashlights Color is the primary element of video content. Most of the video content representations employ color as a feature. Continuity signals based on color feature exhibit significant changes under abrupt illumination changes, such as flashlights. Such a significant change might be identified as a content change (i.e. a shot boundary) by most of the shot boundary detection tools. Several algorithms propose using illumination invariant features, but these algorithms always face with a tradeoff between using an illumination invariant feature and losing the most significant feature in characterizing the variation of the visual content. Therefore, flashlight detection is one of the major challenges in SBD algorithms. 1.2.4 Complexity of the Detector Shot boundary detection is considered as a pre-processing step in most of the video content analysis applications. There are high level algorithms which perform more complex content analysis. Shot boundary detection results are used by these high level analysis algorithms. Since the video content applications takes most of the available computational power and time, it is necessary to keep the computational complexity of the shot boundary detector low. Such a need challenges for algorithms which are sufficiently precise but also computationally inexpensive. As the shot boundary detection problem evolved, in order to increase the performance of the detection, proposed algorithms started to use more than one feature for content representation. On the other hand, such a strategy brings a computational burden on the detector, since each feature requires a separate processing step. 1.3 Problem Formulation Shot boundary detection is one of the basic, yet very important video processing tasks, because the success of higher level processing relies heavily on this step. Normally videos contain enormous amounts of data. For fast and efficient
  • 13. 5 processing and retrieval of these videos they need to be indexed properly, for which shot boundary detection forms the basic process. This project aims at the evaluation of the existing detection techniques for the shots in videos and working towards their enhancement in order to improve the methods so that the problem discussed above can be removed and the techniques can give better and more accurate results. The conditions are met by doing the following two steps; • Feature Extraction: The first step in the process is extracting image features to a distinguishable extent. • Matching: The second step involves matching these features to yield a result that is visually similar.
  • 14. 6 2.LITERATURE SURVEY This chapter concentrate on the work till done in respect of shot boundary detection. John S. Boreczky et al [1996] proposed Comparison of video shot boundary detection techniques and present a comparative analysis of various shot boundary detection techniques and their variations including histograms, discrete cosine transform, motion vector, and block matching methods. Patrick Bouthemy et al [1999] proposed Unified Approach to Shot Change Detection and Camera Motion Characterization which describes an approach to partition a video document into shots by using image motion information, which is generally more intrinsic to the video structure itself. A. Miene et al [2001] presented Advanced and Adaptive Shot Boundary Detection techniques which is based on– feature extraction and shot boundary detection. First, three different features for the measurement of shot boundaries within the video are extracted. Second, detection of the shot boundaries based on the previously extracted features. H. Y: Mark Liaoff et al [2002] proposed a novel dissolve detection algorithm which could avoid the mis-detection of motions by using binomial distribution model to systematically determine the threshold needed for discriminating a real dissolve from global or local motions. Jesús Bescós [2004], proposed a Real-Time Shot Change Detection over Online MPEG-2 Video where it describes a software module for video temporal segmentation which is able to detect both abrupt transitions and all types of gradual transitions in real time. Guillermo Cisneros et al [2005] proposed a paper on A Unified Model for Techniques on Video-Shot Transition Detection. The approach presented here is centred on mapping the space of inter-frame distances onto a new space of decision better suited to achieving a sequence independent thresholding. Liuhong Liang et al[2005] , presented an Enhanced Shot Boundary Detection Using Video Text Information, in which a number of edge-based techniques have been proposed for detecting abrupt shot boundaries to avoid the influence of flashlights common in
  • 15. 7 many video types, such as sports, news, entertainment and interviews videos. Daniel DeMenthon et al [2006] proposed a paper on Shot boundary detection based on Image correlation features in video. This paper is based on image correlation features in the videos. The cut detection is based on the so-called 2max ratio criterion in a sequential image buffer. The dissolve detection is based on the skipping image difference and linearity error in a sequential image buffer. Kota Iwamoto et al [2007] proposed Detection of wipes and digital video effects based on a pattern-independent model of image boundary line characteristics which is based on a new pattern independent model. These models rely on the characteristics of image boundary lines dividing the two image regions in the transitional frames. Jinhui Yuan et al[2008] proposed a paper on a Shot boundary detection method for news video based on object segmentation and tracking. It combines three main techniques: the partitioned histogram comparison method, the video object segmentation and tracking based on wavelet analysis. The partitioned histogram comparison is used as the first filter to effectively reduce the number of video frames which need object segmentation and tracking. Yufeng Li et al [2008] proposed paper on Novel Shot Detection Algorithm Based on Information Theory. Firstly the features of color and texture are extracted by wavelet transform, then the dissimilarity between two successive frames are defined which colligates the mutual information of color feature and the co-occurrence mutual information of texture feature. The threshold is adjusted adaptive based on the entropy of the Continuous frames and it is not depend on the type of video and the kind of shot. Vasileios T. Chasanis et al[2009] presented Scene detection in videos using shot clustering and sequence alignment Foremost the key-frames are extracted using a spectral clustering method employing the fast global k-means algorithm in the clustering phase and also providing an estimate for the number of the key-frames. Then shots are clustered into groups using only visual similarity as a feature and they are labelled according to the group they are assigned. Jinchang Ren et al [2009] proposed a paper on Shot Boundary Detection in MPEG Videos using Local and Global Indicators operating directly in the compressed domain. Several local
  • 16. 8 indicators are extracted from MPEG macro blocks, and Ada Boost is employed for feature selection and fusion. The selected features are then used in classifying candidate cuts into five sub-spaces via pre-filtering and rule based decision making, then the global indicators of frame similarity between boundary frames of cut candidates are examined using phase correlation of dc images. Priyadarshinee Adhikari et al [2009] proposed a paper on Video Shot Boundary Detection. This paper presents video retrieval using shot boundary detection. LihongXu et al [2010] proposed a paper on A Novel Shot Detection Algorithm Based on Clustering. This paper present a novel shot boundary detection algorithm based on K-means clustering. Colour feature extraction is done first and then the dissimilarity of video frames is defined. The video frames are divided into several different sub-clusters through performing K-means clustering. Wenzhu Xu et al [2010] proposed a paper on A Novel Shot Detection Algorithm Based on Graph Theory. This paper present a shot boundary detection algorithm based on graph theory. The video frames are divided into several different groups through performing graph-theoretical algorithm. Arturo Donate et al[2010] presented Shot Boundary Detection in Videos Using Robust Three-Dimensional Tracking. The proposal is to extract salient features from a video sequence and track them over time in order to estimate shot boundaries within the video. Min-Ho Park et al[2010] proposed a paper on Efficient Shot Boundary Detection Using Block wise Motion Based Features. It is a measure of discontinuity in camera and object/background motion is proposed for SBD based on the combination of two motion features: the modified displaced frame difference (DFD) and the block wise motion similarity. Goran J. Zajić et al [2011] proposed a paper on Video shot boundary detection based on multifractal analysis. Low-level features (color and texture features) are extracted from each frame in video sequence then are concatenated in feature vectors (FVs) and stored in feature matrix. Matrix rows correspond to FVs of frames from video sequence, while columns are time series of particular FV component. Partha Pratim Mohanta et al [2012], proposed a paper on A Model Based Shot Boundary Detection
  • 17. 9 Technique Using Frame Transition Parameters which is based on formulated frame estimation scheme using the previous and the next frames. Pablo Toharia et al[2012] proposed a paper on Shot boundary detection using Zernike moments in multi-GPU multi-CPU architectures along with the different possible hybrid combinations based on Zernike moments. Sandip T et al[2012] proposed a paper on Key frame Based Video Summarization Using Automatic Threshold & Edge Matching Rate. Firstly, the Histogram difference of every frame is calculated, and then the edges of the candidate key frames are extracted by Prewitt operator. Ravi Mishra et al[2013] proposed a paper on Video shot boundary detection using dual- tree complex wavelet transform, an approach to process encoded video sequences prior to complete decoding. The proposed algorithm first extracts structure features from each video frame by using dualtree complex wavelet transform and then spatial domain structure similarity is computed between adjacent frames. Zhe Ming Lu et al [2013] present a Fast Video Shot Boundary Detection Based on SVD and Pattern Matching. It is based on segment selection and singular value decomposition (SVD). Initially, the positions of the shot boundaries and lengths of gradual transitions are predicted using adaptive thresholds and most non-boundary frames are discarded at the same time. Sowmya R et al [2013] proposed a paper on Analysis and Verification of Video Summarization using Shot Boundary Detection. The analysis is based on Block based Histogram difference and Block based Euclidean distance difference for varying block sizes. Ravi Mishra et al[2014] proposed a paper on a “Comparative study of block matching algorithm and dual tree complex wavelet transform for shot detection in videos”. This paper presents a comparison between the two detection methods in terms of various parameters like false rate, hit rate, miss rate tested on a set of different video sequence.
  • 18. 10 2.1 GRADIENT FIELD DESCRIPTOR FOR VIDEO SHOT BOUNDARY DETECTION This system accepts the images as inputs of any format and mapping itself into a defined dimensions in order to make it appropriate for processing. This requires a matching process robust to depictive inaccuracy and photometric variation. The approach is to transform database images into canny edge maps and capture local structure in the map using a novel descriptor. Setting an appropriate scale and hysteresis threshold for the canny operator by searching the parameter space for a binary edge map in which a small, fixed percent of pixels are confidential edging. These easy heuristics remove central boundaries and discourages response at the scale of finer texture. Gradient Field HOG , an adaptation of HOG that mitigates the lack of relative spatial information within now by capturing structure from surrounding regions. We are inspired by work on image completion capable of propagating image structure into voids and use a similar Poisson filling approach to improve the richness of information in the gradient field prior to sampling with the HOG descriptor. This simple technique yields significant improvements in performance when matching features to photos, compared to three leading descriptors: Self-Similarity Descriptor (SSIM) SIFT and HOG. Furthermore we show how the descriptor can be applied to localise sketched objects within the retrieved images, and demonstrate this functionality through the features of the image montage application. The success of the descriptor is dependent on correct selection of scale during edge extraction, and use of image salience measures may benefit this process. The system could be enhanced by exploring coloured sketches, or incorporate more flexible models. 2.2 FEATURE EXTRACTORS Usually Feature extractors are divided into 3 types 1. EHD (EDGE HISTOGRAM DISCRIPTOR) 2. SIFT 3. HOG(HISTOGRAM OF ORIENTED GRADIENTS)
  • 19. 11 2.2.1 EHD (EDGE HISTOGRAM DESCRIPTOR) In EHD method, it show how to made global and semi-global edge histogram bins from the local histogram bins. Through various possible clusters of sub-images, in semi-global histograms used 13 patterns.These 13 semiglobal regions and the whole image space are adopted to define the semi-global and the global histograms respectively. These extra histogram information can be obtained directly from the local histogram bins without feature extraction process. Experimental results show that the semi-global and global histograms generated from the local histogram bins help to improve the retrieval performance. FIGURE 2: TYPES OF HISTOGRAM DESCRIPTORS Local Edge Histogram: The normative part of the edge histogram descriptor consists of 80 local edge histogram bins . The semantics of those histogram bins are described in the following sub-sections. To localize edge distribution to a certain area of the image, divide the image space into 4x4 sub-images as shown in Figure 1. Then, for each sub-image, generate an edge histogram to represent edge distribution in the sub- image. for define different edge types, the sub-image is further divided into small square blocks called image-blocks
  • 20. 12 FIGURE 3:DEFINITION OF IMAGES AND SUB-IMAGE BLOCKS Advantages of EHD: 1) Show how to construct global and semi global edge histogram bins from the local histogram bins. 2) Used 13 patterns for the semi-global histograms. 3) Extra histogram information can be obtained directly from the local histogram bins without feature extraction process. 4) Semi global and global histogram generated from the local histogram bins help to improve the retrieval performance. Disadvantages of EHD: 1) Not provide invariant opposite rotation, Scaling and translation. 2) The development is to difficult and robust descriptor is emphasized. 2.2.2 SIFT The approach of SIFT feature detection which is used for object recognition. The invariant features extracted from images can be used to perform reliable matching between different views of an object or scene. The features have been shown to be invariant to image rotation and scale and robust across a substantial range of affine distortion, addition of noise, and change in illumination. For each an image query, gallery photograph, and each sketch/photo correspondence in our dictionary, we compute a SIFT feature representation. SIFT based object matching is a popular method for finding correspondences between images. Introduced by Lowe, SIFT object matching consists of both a scale
  • 21. 13 invariant interest point detector as well as a feature-based similarity measure. Our method is not concerned with the interest point detector as well as a feature based similarity measure. Our method is not concerned with the interest point detection aspect of the SIFT framework, but instead utilizes only the gradient-based feature descriptors (known as SIFT-features) Advantages of SIFT: 1) Perform reliable matching between different views of an object or scene 2) Image rotation and scale and robust across a substantial range of affine distortion, addition of noise, and change in illumination. 3) The image gradient magnitudes orientations are sampled around the key point location Disadvantages of SIFT: 1) Edges are poorly defined and usually hard to detect, but there are still large numbers of key points can be extracted from typical images. 2) Perform the feature matching even the faces are small 2.2.3 HISTOGRAM OF ORIENTED GRADIENTS The essential thought behind the Histogram of Oriented Gradient descriptors is that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. The implementation of these descriptors can be achieved by dividing the image into small connected regions, called cells, and for each cell compiling a histogram of gradient directions or edge orientations for the pixels within the cell. The combination of these histograms then represents the descriptor. For improved accuracy, the local histograms can be contrast-normalized by calculating a measure of the intensity across a larger region of the image, called a block, and then using this value to normalize all cells within the block. This normalization results in
  • 22. 14 better invariance to changes in illumination or shadowing FIGURE 4: HOG BLOCKS Advantages: It is examine for more database The HOG in more cases it is much better than the EHD based retrieval. The edge Histogram descriptor not mainly look better for information poor sketch, while other case show better result can be achieve for more detailed this problem can be overcome by the Hog method.It Capture edge or gradient structure that is very characteristic of local shape. Disadvantages: Working on HOG-based detectors that incorporate motion information using block matching or optical flow fields. Finally, although the current fixed-template-style detector has proven difficult to beat for fully visible pedestrians, humans are highly articulated and we believe that including a parts based model with a greater degree of local spatial invariance After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine- scale gradients, fine orientation binning, relatively coarse spatial binning, and high- quality local contrast normalisation in overlapping descriptor blocks are all important for good results
  • 23. 15 3. SYSTEM ANALYSIS 3.1 PROBLEM STATEMENT The problem involves entering an image as a query and selecting the pre-judged video so which in turn provides database space by breaking down into shots and frames. A software application that is designed to employ SBD techniques in extracting visual properties, and matching them. This is done to retrieve images in the database that are visually similar to the query image. The main problem with this application is the size and shape of the input image which needs to get matched to the size and shape of the database. 3.2 EXISTING SYSTEMS Several systems currently exist, and are being constantly developed. Examples are 1) QBIC or Query By Image was developed by IBM, Almaden Research Centre, to allow users to graphically pose and refine queries based on multiple visual properties such as colour, texture and shape . It supports queries based on input images, user-constructed sketches, and selected colour and texture patterns 2) VIR Image Engine by Virage Inc., like QBIC, enables image retrieval based on primitive attributes such as colour, texture and structure. It examines the pixels in the image and performs an analysis process, deriving image characterisation features 3) Visual SEEK and Web SEEK were developed by the Department of Electrical Engineering, Columbia University. Both these systems support colour and spatial location matching as well as texture matching NeTra was developed by the Department of Electrical and Computer Engineering, University of California. It supports colour, shape, spatial layout and texture matching, as well as image segmentation
  • 24. 16 4) MARS or Multimedia Analysis and Retrieval System was developed by the Beckman Institute for Advanced Science and Technology, University of Illinois. It supports colour, spatial layout, texture and shape matching 3.3 Problem Motivation Video databases and collections can be enormous in size, containing hundreds, thousands or even millions of frames and shots that corresponding to each video respectively. The conventional method of image retrieval in a video is searching for a key features and that would match the descriptive keyword assigned to the image by a human categoriser While computationally expensive, the results are far more accurate than conventional image indexing. Hence, there exists a trade off between accuracy and computational cost. This trade off decreases as more efficient algorithms are utilised and increased computational power becomes inexpensive. 3.4. Proposed Solution The solution initially proposed was to extract the primitive features of a query image and compare them to those of database images. The image features under consideration was over varying features. Thus, using matching and comparison algorithms, the features of one image are compared and matched to the corresponding features of another image which is generally frame of the video under consideration in our case. This comparison is performed using shape distance metrics. In the end, these metrics are performed one after another, so as to retrieve database images that are similar to the query. The similarity between features was to be calculated using algorithms used by well known SBD systems.
  • 25. 17 3.5 SOFTWARE REQUIREMENT ANALYSIS A Software Requirements Specification (SRS) is a complete description of the behaviour of the system to be developed. It includes a set of use cases that describe all the interactions the users will have with the software. Use cases are also known as functional requirements. In addition to use cases, the SRS also contains non-functional (or supplementary) requirements. Non-functional requirements are requirements which impose constraints on the design or implementation (such as performance engineering requirements, quality standards, or design constraints). Functional Requirements: In software engineering, a functional requirement defines a function of a software system or its component. A function is described as a set of inputs, the behaviour, and outputs. Functional requirements may be calculations, technical details, data manipulation and processing and other specific functionality that define what a system is supposed to accomplish. Behavioural requirements describing all the cases where the system uses the functional requirements are captured in use cases. Functional requirements are supported by non-functional requirements (such as performance requirements, security, or reliability). How a system implements functional requirements is detailed in the system design. In some cases a requirements analyst generates use cases after gathering and validating a set of functional requirements. Each use case illustrates behavioural scenarios through one or more functional requirements. Often, though, an analyst will begin by eliciting a set of use cases, from which the analyst can derive the functional requirements that must be implemented to allow a user to perform each use case. Non-Functional Requirements: In systems engineering and requirements engineering, a non-functional requirement is a requirement that specifies criteria
  • 26. 18 that can be used to judge the operation of a system, rather than specific behaviours. This should be contrasted with functional requirements that define specific behaviour or functions In general; functional requirements define what a system is supposed to do whereas non-functional requirements define how a system is supposed to be. Non-functional requirements are often called qualities of a system. Other terms for non-functional requirements are "constraints", "quality attributes", “quality goals" and "quality of service requirements," and "non-behavioural requirements. “Qualities, that is, non-functional requirements, can be divided into two main categories: 1. Execution qualities, such as security and usability, which are observable at run time. 2. Evolution qualities, such as testability, maintainability, extensibility and scalability, which are embodied in the static structure of the software system. 3.6 System Requirements Introduction: To be used efficiently, all computer software needs certain hardware components or other software resources to be present on a computer. These pre-requisites are known as (computer) system requirements and are often used as a guideline as opposed to an absolute rule. Most software defines two sets of system requirements: minimum and recommended. With increasing demand for higher processing power and resources in newer versions of software, system requirements tend to increase over time. Industry analysts suggest that this trend plays a bigger part in driving upgrades to existing computer systems than technological advancements.
  • 27. 19 Hardware Requirements: The most common set of requirements defined by any operating system or software application is the physical computer resources, also known as hardware, a hardware requirements list is often accompanied by a hardware compatibility list (HCL), especially in case of operating systems. An HCL lists tested, compatible, and sometimes incompatible hardware devices for a particular operating system or application. The following sub-sections discuss the various aspects of hardware requirements. Hardware Requirements for Present Project 1.Input Devices: Keyboard and Mouse RAM(512 MB) 2.Processor: P4 or above 3.Storage: Less than 100 GB of HDD space. Software Requirements: Software Requirements deal with defining software resource requirements and pre-requisites that need to be installed on a computer to provide optimal functioning of an application. These requirements or pre-requisites are generally not included in the software installation package and need to be installed separately before the software is installed. Supported Operating Systems: 1.Windows xp, Windows 7,windows 8,windows 8.1 2. Linux all versions
 3. OSX Mountain Lion and above. 4. MATLAB_R2013a
  • 28. 20 4 MODULES The project has 3 main modules which are used for image retrieval from videos. The modules are as follows 4.1 GUI Module 4.2 Query module I Querying Input Image II Querying Input Video the following have different modules again which are described shortly. 4.1 GUI MODULE This is the main module in the point of User prospective. The input image and the corresponding output images are displayed over here and all the interaction performed by the user is monitored graphically in this module 4.2 QUERY MODULE During the query module , User inputs a query image in the application, now we need to change the size and shape of the query image to that of the database images which are predefined earlier. Query module can be sub divided into 2 modules I . Querying Input Image II. Querying Input Video The images and videos can be processed as follows, I. Querying Input Image The input image should satisfy certain criteria in order to make the matching and comparison valid. They are as follows; Image Enhancement Image enhancement is conversion of the original imagery to a better understandable level in spectral quality for feature extraction or image interpretation. It is useful to examine the image histograms before performing any image enhancement. The x-axis of the histogram is the range of the available digital
  • 29. 21 numbers, i.e. 0 to 255 (in case of grey level). The y-axis is the number of pixels in the image having a given digital number Contrast stretching to increase the tonal distinction between various features in a scene. The most common types of contrast stretching enhancement are: a linear contrast stretch, a linear contrast stretch with saturation, a histogram-equalized stretch. Filtering is commonly used to restore an image by avoiding noises to enhance the image for better interpretation and to extract features such as edges. The most common types of filters used are: mean, median, low pass, high pass, edge detection. Image Transformation Image transformations usually involve combined processing of data from multiple spectral bands. Arithmetic operations (i.e. subtraction, addition, multiplication, division) are performed to combine and transform the original bands into "new" images which better display or highlight certain features in the scene. Some of the most common transforms applied to image data are: image rationing: this method involves the differencing of combinations of two or more bands aimed at enhancing target features or principal components analysis (PCA). The objective of this transformation is to reduce the dimensionality (i.e. the number of bands) in the data, and compress as much of the information in the original bands into fewer bands. Image Classification Information extraction is the last step toward the final output of the image analysis. After pre-processing the data is subjected to quantitative analysis to assign individual pixels to specific classes. Classification of the image is based on the known and unknown identity to classify the remainder of the image consisting of those pixels of unknown identity. After classification is complete, it is necessary to evaluate its accuracy by comparing the categories on the classified images with the
  • 30. 22 areas of known identity on the ground. The final result of the analysis provides the user with full information concerning the source data, the method of analysis and the outcome and its reliability. Applications of Image Processing: Interest in digital image processing methods stems from two principal application areas: (1) Improvement of pictorial information for human interpretation, and (2) Processing of scene data for autonomous machine perception. In the second application area, interest focuses on procedures for extracting information from an image in a form suitable for computer processing. Examples include automatic character recognition, industrial machine vision for product assembly and inspection, military recognizance, automatic processing of fingerprints etc. Basics of video processing: Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion. Essentially the time component is considered in the case of videos. It may be further described in the following manner: 1 Video is a sequence of still images representing scenes in motion. 2 Video is a motion technology of "electronically" done. 3 Video will be "capturing, recording, processing, storing, transmitting, and reconstructing" all done electrically. 4.3 FEATURE EXTRACTION Here in feature extraction we have used a efficient feature extraction technique called as Histogram of Oriented Gradients. which retrieves the query image’s color ,shape and texture features and compare them to that of the database images.
  • 31. 23 4.4 IMAGE RETRIEVAL During this module of the project, the features are getting compared and sorted out. We have used a an efficient algorithm called euclidean distance which is used to compare the features. The feature vectors are then sorted and indexing mechanisms are used to retrieve the image from the databases and then they are displayed accordingly on the GUI.
  • 32. 24 5. SELECTED SOFTWARE The software used to perform different operations on different kinds of images and their properties is a software called as MATLAB. The overview of Matlab and its operations over different images and their features are illustrated with examples in the below description as follows 5.1 MATLAB The name ‘Matlab’ comes from two words: matrix and laboratory. According to The MathWorks (producer of Matlab), Matlab is a technical computing language used mostly for high-performance numeric calculations and visualization. It integrates computing, programming, signal processing and graphics in easy to use environment, in which problems and solutions can be expressed with mathematical notation. Basic data element is an array, which allows for computing difficult mathematical formulas, which can be found mostly in linear algebra. But Matlab is not only about math problems. It can be widely used to analyze data, modeling, simulation and statistics. Matlab high-level programming language finds implementation in other fields of science like biology, chemistry, economics, medicine and many more. In the following paragraph which is fully based on the MathWorks, ‘Getting started with Matlab’, I introduce the main features of the Matlab.Most important feature of Matlab is easy extensibility. This environment allows creating new applications and becoming contributing author. It has evolved over many years and became a tool for research, development and analysis. Matlab also features set of specific libraries, called toolboxes. They are collecting ready to use functions, used to solve particular areas of problems. Matlab System consist five main parts. First, Desktop Tools and Development Environment are set of tools helpful while working with functions and files. Examples of this part can be
  • 33. 25 command window, the workspace, notepad editor and very extensive help mechanism. Second part is The Matlab Mathematical Function Library. This is a wide collection of elementary functions like sum, multiplication, sine, cosine, tangent, etc. Besides simple operations, more complex arithmetic can be calculated, including matrix inverses, Fourier transformations and approximation functions. Third part is the Matlab language, which is high-level array language with functions, data structures and object-oriented programming features. It allows programming small applications as well as large and complex programs. Fourth piece of Matlab System is its graphics. It has wide tools for displaying graphs and functions. It contains two and three-dimensional visualization, image processing, building graphic user interface and even animation. Fifth and last part is Matlab’s External Interfaces. This library gives a green light for writing C and Fortran programs, which can be read and connected with Matlab. Data representation : Data representation in Matlab is the feature that distinguishes this environment from others. Everything is presented with matrixes. The definition of matrix by MathWorks is a rectangular array of numbers. Matlab recognizes binary and text files. There is couple of file extensions that are commonly used, for example *.m stands for M-file. There are two kinds of it: script and function M-file. Script file contains sequence of mathematical expressions and commands. Function type file starts with word function and includes functions created by the user. Different example of extension is *.mat. Files *.mat are binary and include work saved with command File/Save or Save as. Since Matlab stores all data in matrixes, program offers many ways to create them. The easiest one is just to type values. There are three general rules: • the elements of a row should be separated with spaces or commas;
  • 34. 26 • to mark the end of each row a semicolon ‘;’ should be used; • square brackets must surround whole list of elements. After entering the values matrix is automatically stored in the workspace (MathWorks, 2002, chapter 3.3). To take out specific row, round brackets are required. In the 3x3 matrix, pointing out second row would be (2,:) and third column (:,3). In order to recall one precise element bracket need to contain two values. For example (2,3) stands for third element in the second row. Variables are declared as in every other programming language. Also arithmetic operators are presented in the same way – certain value is assigned to variable. When the result variable is not defined, Matlab creates one, named Ans, placed in the workspace. Ans stores the result of last operation. One command worth mentioning is plot command. It is responsible for drawing two dimensional graphs. Although this command belongs to the group liable for graphics, it is command from basic Matlab instructions, not from Image Processing toolbox. It is not suitable for processing images, therefore it will not be described. Last paragraph considers matrixes as two-dimensional structures. For better understanding how Matlab stores images, three dimensional matrixes have to be explained. In three dimensional matrixes there are three values in the brackets. First value stands for number of row, second value means column and third one is the extra dimension. Similarly, fourth number would go as fourth dimension, etc. The best way to understand it, is to look at Figure 3, which presents the method of pointing each element in this three dimensional matrix. Figure 3. The example of three dimensional matrix, built in Matlab (Ozimek, lectures from Digital image processing, 2010a) As mentioned before, Matlab stores images in arrays, which naturally suit to the representation of images. Most pictures are kept in two-dimensional matrices. Each element corresponds to one pixel in the image. For example image of 600 pixels height and
  • 35. 27 800 pixels width would be stored in Matlab as a matrix in size 600 rows and 800 columns. More complicated images are stored in three-dimensional matrices. Truecolor pictures require the third dimension, to keep their information about intensities of RGB colors. They vary between 0 and 1 value (MathWorks, 2009, 2.12). The most convenient way of pointing locations in the image, is pixel coordinate system. To refer to one specific pixel, Matlab requires number of row and column that stand for sought point. Values of coordinates range between one and the length of the row or column. Images can also be expressed in spatial system coordinates. In that case positions of pixel are described as x and y. By default, spatial coordinates correspond with pixel coordinates. For example pixel (2,3) would be translated to x=3 and y=2. The order of coordinates is reversed (Koprowski & Wróbel, 2008, 20-21). Endless possibilities : As mentioned earlier, Matlab offers very wide selection of toolboxes. Most of them are created by Mathworks but some are made by advanced users. There is a long list of possibilities that this program gives. Starting from automation, through electrical engineering, mechanics, robotics, measurements, modeling and simulation, medicine, music and all kinds of calculations. Next couple of paragraphs will shortly present some toolboxes available in Matlab. The descriptions are based on the theory from Mrozek&Mrozek (2001, 387 – 395) about toolboxes and mathworks.com. Very important group of toolboxes are handling with digital signal processing. Communication Toolbox provides mechanisms for modeling, simulation, designing and analysis of functions for the physical layer of communication systems. This toolbox includes algorithms that help with coding channels, modulation, demodulation and multiplexing of digital
  • 36. 28 signals. Communication toolbox also contains graphical user interface and plot function for better understanding the signal processing. Similarly, Signal Processing Toolbox, deals with signals. Possibilities of this Matlab library are speech and audio processing, wireless and wired communications and analog filter designing. Another group is math and optimization toolboxes. Two most common are Optimization and Symbolic Math toolboxes. The first one handles large-scale optimization problems. It contains functions responsible for performing nonlinear equations and methods for solving quadratic and linear problems. More used library is the second one. Symbolic Math toolbox contains hundreds of functions ready to use when it comes to differentiation, integration, simplification, transforms and solving of equations. It helps with all algebra and calculus calculations. Small group of Matlab toolbox handles statistics and data analysis. Statistics toolbox features are data management and organization, statistical drawing, probability computing and visualization. It also allows designing experiments connected with statistic data. Financial Toolbox is an extension to previously mentioned library. Like the name states, this addition to Matlab handles finances. It is widely used to estimate economical risk, analyze interest rate and creating financial charts. It can also work with evaluation and interpretation of stock exchange actions. Neural Networks Toolbox can be considered as one of the data analyzing library. It has set of functions that create, visualize and simulate neural networks. It is helpful when data change nonlinearly. Moreover, it provides graphical user interface equipped with trainings and examples for better understanding the way neural network works. Some toolboxes do not belong to any specific group but they are worth mentioning. For example Fuzzy Logic Toolbox offers wide range of functions responsible for fuzzy calculations. It allows user to look through the results of
  • 37. 29 fuzzy computations. Matlab provides also very useful connection to databases through Database Toolbox. It allows analyzing and processing the information stored in the tables. It supports SQL (Structured Query Language) commands to read and write data, and to create simple queries to search through the information. This specific toolbox interacts with Oracle and other database processing programs. And what is most important, Database Toolbox allows beginner users, not familiar with SQL, to access and query databases. Last but not least, very important set of libraries – image processing toolboxes. Mapping Toolbox is one of them, which is responsible for analyzing geographic data and creating maps. It provides compatibility for raster and vector graphics which can be imported. Additionally, as well two-dimensional and three-dimensional maps can be displayed and customized. It also helps with navigation problems and digital terrain analysis. Image Acquisition Toolbox is a very valuable collection of functions that handles receiving image and video signal directly from computer to the Matlab environment. This toolbox recognizes video cameras from multiple hardware vendors. Specially designed interface leads through possible transformations of images and videos, acquired thanks to mechanisms of Image Acquisition Toolbox. Image Processing Toolbox is a wide set of functions and algorithms that deal with graphics. It supports almost any type of image file. It gives the user unlimited options for pre- and post- processing of pictures. There are functions responsible for image enhancement, deblurring, filtering, noise reduction, spatial transformations, creating histograms, changing the threshold, hue and saturation, also for adjustment of color balance, contrast, detection of objects and analysis of shapes.
  • 38. 30 5.2 Features 5.2.1 Colour Definition One of the most important features that make possible the recognition of images by humans is colour. Colour is a property that depends on the reflection of light to the eye and the processing of that information in the brain. We use colour everyday to tell the difference between objects, places, and the time of day. Usually colours are defined in three dimensional colour spaces. These could either be RGB (Red, Green, and Blue), HSV (Hue, Saturation, and Value) or HSB (Hue, Saturation, and Brightness). The last two are dependent on the human perception of hue, saturation, and brightness. Most image formats such as JPEG, BMP, GIF, use the RGB colour space to store information. The RGB colour space is defined as a unit cube with red, green, and blue axes. Thus, a vector with three co-ordinates represents the colour in this space. When all three coordinates are set to zero the colour perceived is black. When all three coordinates are set to 1 the colour perceived is white. The other colour spaces operate in a similar fashion but with a different perception. Methods of Representation The main method of representing colour information of images in any system is through colour histograms. A colour histogram is a type of bar graph, where each bar represents a particular colour of the colour space being used. In MatLab for example you can get a colour histogram of an image in the RGB or HSV colour space. The bars in a colour histogram are referred to as bins and they represent the x-axis. The number of bins depends on the number of colours there are in an image. The y-axis denotes the number of pixels there are in each bin. In other words how many pixels in an image are of a particular colour.
  • 39. 31 An example of a colour histogram in the HSV colour space can be seen with the following image: FIG 5: AN IMAGE AND ITS HISTOGRAM To view a histogram numerically one has to look at the colour map or the numeric representation of each bin. As one can see from the colour map each row represents the colour of a bin. The row is composed of the three coordinates of the colour space. The first coordinate represents hue, the second saturation, and the third, value, thereby giving HSV. The percentages of each of these coordinates are what make up the colour of a bin. Also one can see the corresponding pixel numbers for each bin, which are denoted by the blue lines in the histogram. Quantization in terms of colour histograms refers to the process of reducing the number of bins by taking colours that are very similar to each other and putting them in the same bin. By default the maximum number of bins one can obtain using the histogram function in MatLab is 256. For the purpose of saving time when trying to compare colour histograms, one can quantize the number of bins. Obviously quantization reduces the information regarding the images but as was mentioned this is the tradeoff when one wants to reduce processing time.There are two types of colour histograms, Global colour histograms (GCHs) and Local colour histograms (LCHs). A GCH represents one whole image with a single colour histogram. An LCH divides an image into fixed blocks and takes the colour histogram of each of those blocks. LCHs contain more information about an image but are computationally expensive when comparing images. “The GCH is the traditional method for colour based image retrieval. However, it does not include
  • 40. 32 information concerning the colour distribution of the regions” of an image. Thus when comparing GCHs one might not always get a proper result in terms of similarity of images. 5.2.2Texture Definition Texture is that innate property of all surfaces that describes visual patterns, each having properties of homogeneity. It contains important information about the structural arrangement of the surface, such as; clouds, leaves, bricks, fabric, etc. It also describes the relationship of the surface to the surrounding environment. In short, it is a feature that describes the distinctive physical composition of a surface. Texture properties include: 1) Coarseness 2) Contrast 3) Directionality 4) Line-likeness 5) Regularity FIGURE 6: TEXTURE PROPERTIES Texture is one of the most important defining features of an image. It is characterised by the spatial distribution of gray levels in a neighbourhood. In order to capture the spatial dependence of gray-level values, which contribute to the perception of texture, a two-dimensional dependence texture analysis matrix is taken into consideration. This two- dimensional matrix is obtained by decoding the image file; jpeg, bmp, etc. Methods of Representation There are three principal approaches used to describe texture; statistical, structural
  • 41. 33 and spectral… • Statistical techniques characterise textures using the statistical properties of the grey levels of the points/pixels comprising a surface image. Typically, these properties are computed using: the grey level co-occurrence matrix of the surface, or the wavelet transformation of the surface. • Structural techniques characterise textures as being composed of simple primitive structures called “texels” (or texture elements in Figure 6). These are arranged regularly on a surface according to some surface arrangement rules. • Spectral techniques are based on properties of the Fourier spectrum and describe global periodicity of the grey levels of a surface by identifying high- energy peaks in the Fourier spectrum For optimum classification purposes, what concern us are the statistical techniques of characterisation. This is because it is these techniques that result in computing texture properties. The most popular statistical representations of texture are: • Co-occurrence Matrix • Tamura Texture • Wavelet Transform Co-occurrence Matrix Originally proposed by R.M. Haralick, the co-occurrence matrix representation of texture features explores the grey level spatial dependence of texture. A mathematical definition of the co-occurrence matrix is as follows: - Given a position operator P(i,j), - let A be an n x n matrix - whose element A[i][j] is the number of times that points with grey level (intensity) g[i] occur, in the position specified by P, relative to points with grey level g[j]. - Let C be the n x n matrix that is produced by dividing A with the total number of point pairs that satisfy P. C[i][j] is a measure of the joint probability that a pair of points satisfying P will have values g[i], g[j]. - C is called a co-occurrence matrix defined by P.
  • 42. 34 FIGURE 7: CLASSICAL CO-OCCURENCE MATRIX At first the co-occurrence matrix is constructed, based on the orientation and distance between image pixels as shown in Figure 7. Then meaningful statistics are extracted from the matrix as the texture representation. Haralick proposed the following texture features: 1. Angular Second Moment 2. Contrast 3. Correlation 4. Variance 5. Inverse Second Differential Moment 6. Sum Average 7. Sum Variance 8. Sum Entropy 9. Entropy 10. Difference Variance 11.Difference Entropy 12.Measure of Correlation 1 13.Measure of Correlation 2 14.Local Mean Hence, for each Haralick texture feature, we obtain a co-occurrence matrix. These
  • 43. 35 co-occurrence matrices represent the spatial distribution and the dependence of the grey levels within a local area. Each (i,j) th entry in the matrices, represents the probability of going from one pixel with a grey level of 'i' to another with a grey level of 'j' under a predefined distance and angle. From these matrices, sets of statistical measures are computed, called feature vectors. Tamura Texture By observing psychological studies in the human visual perception, Tamura explored the texture representation using computational approximations to the three main texture features of: coarseness, contrast, and directionality. Each of these texture features are approximately computed using algorithms… • Coarseness is the measure of granularity of an image, or average size of regions that have the same intensity. • Contrast is the measure of vividness of the texture pattern. Therefore, the bigger the blocks that make up the image, the higher the contrast. It is affected by the use of varying black and white intensities. • Directionality is the measure of directions of the grey values within the image. Wavelet Transform Textures can be modeled as quasi-periodic patterns with spatial/frequency representation. The wavelet transform transforms the image into a multi-scale representation with both spatial and frequency characteristics. This allows for effective multi-scale image analysis with lower computational cost. According to this transformation, a function, which can represent an image, a curve, signal etc., can be described in terms of a coarse level description in addition to others with details that range from broad to narrow scales. Unlike the usage of sine functions to represent signals in Fourier transforms, in wavelet transform, we use functions known as wavelets. Wavelets are finite in time, yet the average value of a wavelet
  • 44. 36 is zero. In a sense, a wavelet is a waveform that is bounded in both frequency and duration. While the Fourier transform converts a signal into a continuous series of sine waves, each of which is of constant frequency and amplitude and of infinite duration, most real-world signals (such as music or images) have a finite duration and abrupt changes in frequency. This accounts for the efficiency of wavelet transforms. This is because wavelet transforms convert a signal into a series of wavelets, which can be stored more efficiently due to finite time, and can be constructed with rough edges, thereby better approximating real-world signals. Examples of wavelets are Coiflet, Morlet, Mexican Hat, Haar and Daubechies. Of these, Haar is the simplest and most widely used, while Daubechies have fractal structures and are vital for current wavelet applications. These two are outlined below: 5.2.3 Shape Definition Shape may be defined as the characteristic surface configuration of an object; an outline or contour. It permits an object to be distinguished from its surroundings by its outline [Figure 8]. Shape representations can be generally divided into two categories: • Boundary-based, and • Region-based. FIGURE 8: BOUNDARY-BASED & REGION-BASED Boundary-based shape representation only uses the outer boundary of the shape.
  • 45. 37 This is done by describing the considered region using its external characteristics; i.e., the pixels along the object boundary. Region-based shape representation uses the entire shape region by describing the considered region using its internal characteristics; i.e., the pixels contained in that region. Methods of Representation For representing shape features mathematically, we have: Boundary-based: 1) Polygonal Models, boundary partitioning 2) Fourier Descriptors 3) Splines, higher order constructs 4) Curvature Models Region-based: 1) Superquadrics 2) Fourier Descriptors 3) Implicit Polynomials 4) Blum's skeletons The most successful representations for shape categories are Fourier Descriptor and Moment Invariants: 1) The main idea of Fourier Descriptor is to use the Fourier transformed boundary as the shape feature. 2) The main idea of Moment invariants is to use region-based moments, which are invariant to transformations as the shape feature.
  • 46. 38 6. SYSTEM DESGIN 6.1 Introduction System design is the process of defining the component, modules, interfaces and data for a system to satisfy specified requirements. This is usually done in 2 in ways I. UML DIAGRAMS II. DATAFLOW DIAGRAMS 6.2 Data Flow Diagram The Data Flow Diagram (DFD) is a graphical representation of the flow of data through an information system. It enables you to represent the processes in your information system from the viewpoint of data FIGURE 9: DATA FLOW DIAGRAM OF VIDEO SHOT BOUNDARY DETECTION
  • 47. 39 7.IMPLEMENTATION CREATING A GUI:- A Graphical User-Interface(GUI) is an essential part where a user can easily navigate through the application and its where about. It has all the essential parts which responds as per the request made by the user. Each element arranged in the GUI has an axis in a coordinate system and all these elements are given action specific code in a file that corresponds to matlab file(.M file). For creating an graphical user interface we use matlab’s graphical user interface tool box for creation and element positioning. We start with ‘guide’ command in the command window, which opens the tool box for matlab GUI creation. Here, we specify the name of the GUI we are going to create and matlab opens us a GUI building module. Firstly, the title of the project 'VIDEO SHOT BOUNDARY DETECTION' is placed at the centre as shown in the below figure. This is done using the static tool present in the toolbox. By double clicking the text field generated we can edit the features of the text field like changing the color and text fields. Similarly, it was done for the input image text field also in the GUI.
  • 48. 40 Assigning the elements their corresponding axes for making an active functionality while coding is done by this phase of the creation of axes, which helps in displaying the images corresponding to a event happening dynamically. Axes is used create an axes graphics object in the current figure using default property values. It is done with the help of axes tools present in the toolbox, each axes is named accordingly so that an event can be handled over it. We need to create an interactive interface like buttons, so that the client or the user interacts with the application easily. We have created two buttons for video selection and image selection. We do this by placing the buttons from the tool box. However an event should occur when the user clicks the button, this is done with the help of function calling or call back. This creates a .M file , where the code is getting handled. Creating a BasicGui.M File In this step the actual logic is present. The algorithm we have used here is 'Euclidean distance', which we use to calculate the distance between the query image and the frames of the video in the comparison space. Euclidean distance: In order to calculate the distance between two images to find the similarity and dissimilarity, the algorithm used is euclidean distance algorithm. In euclidean distance metric difference of feature of query and database image is squared which increased the divergence between the query and database image. Logic of Feature extraction and comparison For extracting the Features and comparing them with the datasets according to the following steps as shown, Step 1: Reading the image input from the GUI Step 2: After reading the input image into a variable, display on the axes using axes function
  • 49. 41 Step 3: Run the vl-feat library in the matlab by tracing the current directory path. Step 4: Initialise the cell Size = 8 with 50 Percent overlapping factor. Step 5: Since the input image and database image of the frame from the video should be of same size we need toresize the image into 256 X 256 size Step 6 :We need to calculate the gradients of the input image. This is done with the help of Histogram of Oriented Gradients. This returns a matrix containing all the gradient values and magnitude values, which are the feature descriptor values Step 7: Reshape the matrix formed in Step 6 into single dimensional array for comparison Step 8: Similarly for all the frames in the database from 1 to the the count of total number of files repeat steps 4 to Step 8 Step 9: Then store the single dimension values of all images into a single dimension array. Which means each element in the cell array which is single dimensional has single dimensional values of gradient values. Step10:Using Euclidean method as stated above, calculate the distance between database images and input image. In order to make sure that all the distance values are in a single dimensional array Step 11:After finding the distances corresponding to each frame in the video and the current query image, the indexes in the directory and display the images over the axes. Step 12:Then from the obtained Euclidean values of all the frames in the database, the best values of the images are chosen and displayed in the axes. Thus, the execution of the project is terminated.
  • 50. 42 7.1 SAMPLE CODE: % --- Executes on button press in pushbutton1. function pushbutton1_Callback(hObject, eventdata, handles) % hObject handle to pushbutton1 (see GCBO) % eventdata reserved - to be defined in a future version of MATLAB % handles structure with handles and user data (see GUIDATA) [filename pathname]=uigetfile({'*.bmp';'*.mp4';'*.avi';},'File Selector'); str=strcat(pathname,filename); obj = mmreader(str); vid = read(obj); frames = obj.NumberOfFrames; for x = 1 : frames imwrite(vid(:,:,:,x),strcat('frame- ',num2str(x),'.jpeg')); end %cd('C:UsersanveshDesktopProject'); a=imread('frame-1.jpeg'); axes(handles.axes4); imshow(a); b=imread('frame-2.jpeg'); axes(handles.axes5); imshow(b); c=imread('frame-3.jpeg'); axes(handles.axes6); imshow(c); d=imread('frame-4.jpeg'); axes(handles.axes7); imshow(d); e=imread('frame-5.jpeg'); axes(handles.axes8); imshow(e); f=imread('frame-6.jpeg'); axes(handles.axes9); imshow(f); g=imread('frame-7.jpeg'); axes(handles.axes10); imshow(g); h=imread('frame-8.jpeg'); axes(handles.axes11); imshow(h); i=imread('frame-9.jpeg'); axes(handles.axes12);
  • 51. 43 imshow(i); % --- Executes on button press in pushbutton2. function pushbutton2_Callback(hObject, eventdata, handles) % hObject handle to pushbutton2 (see GCBO) % eventdata reserved - to be defined in a fut ure version of MATLAB % handles structure with handles and user da ta (see GUIDATA) % function for image selection der or not run('C:UsersanveshDesktop1Projectvlfeat- 0.9.19toolboxvl_setup'); [filename pathname]=uigetfile({'*.jpeg';'*.png';'*.avi';},'File Selector'); str=strcat(pathname,filename); img1=imread(str); imagefiles=dir('*.jpeg'); img1resize=imresize(img1,[256 256]); cellsize=8 hog1=vl_hog(single(img1resize),cellsize,'verbose'); n1=numel(hog1); rehog1=reshape(hog1,[1,n1]); nfiles=length(imagefiles); for i=1:1:nfiles currentfile=imagefiles(i).name; currentimage=imread(currentfile); image{i}=imresize(currentimage,[256 256]); hogca{i}=vl_hog(single(image{i}),cellsize,'verbose'); n=numel(hogca{i}); rehogca{i}=reshape(hogca{i},[1,n]); dist(i)=pdist2(rehog1,rehogca{i},'euclidean'); end [sorted,ix]=sort(dist); firstIndex=ix(1:10); str1=imagefiles(firstIndex(1)).name; image1=imread(str1); image2=imread(str); imgresize1=imresize(image1,[256 256]); imgresize2=imresize(image2,[256 256]); x='found image'; if(imgresize1==imgresize2) str1=imagefiles(firstIndex(1)).name disp(x); else disp('not found'); end %Running vl-feat toolbox for handling videos %initially by breaking down %into shots and frames
  • 52. 44 for i=1:1:20 run('vlfeat-0.9.19/toolbox/vl_setup'); cellsize=8; str='frame-'; str2=strcat(str,num2str(i)); str2=strcat(str2,'.jpeg'); img{i}=imread(str2); imgresize=imresize(img{i},[256 256]); hog1{i}=vl_hog(single(imgresize),cellsize,'verbose'); n1=numel(hog1{i}); rehog{i}=reshape(hog1{i},[1,n1]); % input done imagefiles=dir('*.jpeg'); nfiles=length(imagefiles); for j=1:1:nfiles currentimage=imread(imagefiles(j).name); ir{j}=imresize(currentimage,[256 256]); hc{j}=vl_hog(single(ir{j}),cellsize,'verbose'); n=numel(hc{j}); rh{j}=reshape(hc{j},[1,n]); dist{j}=pdist2(rehog{i},rh{j},'euclidean') end end
  • 53. 45 7.2 SAMPLE_GUI.M & SIMULATION RESULTS PROVIDING THE QUERY IMAGE:
  • 54. 46 DISPLAYING THE FRAMES CORRESPONDING TO THE VIDEO.
  • 55. 47 OUTPUT OF THE EXECUTION(features matched):-
  • 56. 48 OUTPUT OF THE EXECUTION (features unmatched):-
  • 57. 49 8.SYSTEM TESTING 8.1 TESTING:  Testing is a process of executing a program with a intent of finding an error.  Testing presents an interesting anomaly for the software engineering.  Testing is a set of activities that can be planned in advance and conducted systematically  Testing is a set of activities that can be planned in advance and conducted systematically  Software testing is often referred to as verification & validation. 8.2 TYPES OF TESTING: The various types of testing are 1)White Box Testing 2)Black Box Testing 3)Alpha Testing 4)Beta Testing 5)Win Runner 6)Load Runner 7)System Testing 8)Unit Testing 9)End to End Testing The type of testing we have used to measure the accuracy and efficiency of the retrieval is Black box testing. It is used to check the output depending upon the input given WHITE-BOX TESTING White-box testing, sometimes called glass-box testing, is a test case design method that uses the control structure of the procedural design to derive test cases. Using white-box testing methods, the software engineer can derive test cases that
  • 58. 50 (1) guarantee that all independent paths within a module have been exercised at least once, (2) exercise all logical decisions on their true and false sides, (3) execute all loops at their boundaries and within their operational bounds, and (4) exercise internal datastructures to ensure their validity. A reasonable question might be posed at this juncture: "Why spend time and energy worrying about (and testing) logical minutiae when we might better expend effort It is not possible to exhaustively test every program path because the number of paths is simply too large White-box tests can be designed only after a component-level design (or source code) exists. The logical details of the program must be available ensuring that program requirements have been met?" Stated another way, why don't we spend all of our energy on black-box tests? The answer lies in the nature of software defects • Logic errors and incorrect assumptions are inversely proportional to the probability that a program path will be executed. Errors tend to creep into our work when we design and implement function, conditions, or control that are out of the mainstream. Everyday processing tends to be well understood (and well scrutinized), while "special case" processing tends to fall into the cracks. • We often believe that a logical path is not likely to be executed when, in fact, it may be executed on a regular basis. The logical flow of a program is sometimes counterintuitive, meaning that our unconscious assumptions about flow of control and data may lead us to make design errors that are uncovered only once path testing commences. • Typographical errors are random. When a program is translated into programming language source code, it is likely that some typing errors will occur. Many will be uncovered by syntax and type checking mechanisms, but others may go undetected until testing begins. It is as likely that a typo will exist on an obscure logical path as on a mainstream path. BlackBox Testing: 1) Its also called as behavioural testing . It focuses on the functional requirements of the software.
  • 59. 51 2) It is complementary approach that is likely to uncover a different class of errors than white box errors. 3) A black box testing enables a software engineering to derive a sets of input conditions that will fully exercise all functional requirements for a program can be applied to virtually every level of software testing. Accuracy and precision are defined in terms of systematic and random errors. The more common definition associates accuracy with systematic errors and precision with random errors. Accuracy is measured by formula Precision is measure by the formula ALPHA TESTING:- The alpha test is conducted at the developer's site by a customer. The software is used in a natural setting with the developer "looking over the shoulder" of the user and recording errors and usage problems. Alpha tests are conducted in a controlled environment. BETA TESTING:- The beta test is conducted at one or more customer sites by the end-user of the software. Unlike alpha testing, the developer is generally not present. Therefore, the beta test is a "live" application of the software in an environment that cannot be controlled by the developer. The customer records all problems (real or imagined) that are encountered during beta testing and reports these to the developer at regular intervals. As a result of problems reported during beta tests, software engineers
  • 60. 52 make modifications and then prepare for release of the software product to the entire customer base. SYSTEM TESTING:- System testing is actually a series of different tests whose primary purpose is to fully Exercise the computer-based system. Although each test has a different purpose, all Work to verify that system elements have been properly integrated and perform allocated functions. In the sections that follow, we discuss the types of system tests that are worthwhile for software-based systems. UNIT TESTING:- Unit testing focuses verification effort on the smallest unit of software design— the software component or module. Using the component-level design description as a guide, important control paths are tested to uncover errors within the boundary of the module. The relative complexity of tests and uncovered errors is limited by the constrained scope established for unit testing. The unit test is white-box oriented, and the step can be conducted in parallel for multiple components. 8.3 TestCases / Sample Cases 1) Input Image - epic.jpeg output from the module - frame-1.jpeg match found Precision = 1/1+0 * 100 =100 % 2) Input Image - epic1.jpeg output from the module - frame -57.jpeg match found Precision = 1/1+0 * 100 =100 % 3) Input image - 802.png output from the module - match not found Precision = 0/0+1 * 100 =0% 4) Input image - 1710.png output from the module - frame 131.jpeg match found Precision = 1/1+0 * 100 =100 % accuracy = 3 + 1 / 3+0+0+1 = 4/4 =100%
  • 61. 53 9. CONCLUSION In this project, the an efficient method for VIDEO SHOT BOUNDARY DETECTION implementation in any video format is done basing on Histogram of oriented gradients. It has showed that our schema has the most efficiency when compared to the existing feature extractors. This project has clearly demonstrated the retrieval of the necessary information from different formats video content is done efficiently. Therefore the solution can be treated as a new candidate for Retrieval system. As part of future work, extend of work to explore and design more effective retrieval we can use mutation techniques. However First, this project found that the performance and efficiency of SBD, especially for all kinds of image and video formats and under a huge dataset conditions.
  • 62. 54 10. REFERENCES  A survey based on Video Shot Boundary Detection techniques by Nikita Sao1 , Ravi Mishra2 ME Scholar, Department of ET&T, SSCET, Bhilai, India.International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 4, April 2014  A Survey on Visual Content-Based Video Indexing and Retrieval Weiming Hu, Senior Member, IEEE, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen MaybankIEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 6, NOVEMBER 2011  Shot Boundary Detection Using Shifting of Image Frame Rahul Kumar Garg1 , Gaurav Saxena2 International Journal of Scientific Engineering and Technology (ISSN : 2277-1581) Volume No.3 Issue No.6, pp : 785-788 June-2014  International Journal of Scientific Engineering and Technology (ISSN : 2277-1581) Volume No.3 Issue No.6, pp : 785-788  Video Shot Detection Techniques Brief Overview Mohini Deokar, Ruhi Kabra International Journal of Engineering Research and General Science Volume 2, Issue 6, October-November, 2014 ISSN 2091-2730  http://en.wikipedia.org/wiki/Shot_transition_detection
  • 63. 55 APPENDIX-I In order to retrieve the histogram of oriented gradient features we need a feature extractor code to exhibit the functionality. This is done with the help of C Language, where we use MEX libraries to define the functionality in the C code and use the functions in Matlab HOG.c /** @file hog.c ** @brief Histogram of Oriented Gradients (HOG) - Definition **/ /* Copyright (C) 2014 Anvesh. All rights reserved. This file is part of the VLFeat library and is made available under the terms of the BSD license (see the COPYING file). */ #include "hog.h" #include "mathop.h" #include <string.h> /** <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> @page hog Histogram of Oriented Gradients (HOG) features @author Anvesh <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> @ref hog.h implements the Histogram of Oriented Gradients (HOG) features in the variants of Dalal Triggs @cite{dalal05histograms} and of UOCTTI @cite{felzenszwalb09object}. Applications include object detection and deformable object detection. - @ref hog-overview - @ref hog-tech <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> @section hog-overview Overview <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • 64. 56 ~~~~~~~ --> HOG is a standard image feature used, among others, in object detection and deformable object detection. It decomposes the image into square cells of a given size (typically eight pixels), compute a histogram of oriented gradient in each cell (similar to @ref sift), and then renormalizes the cells by looking into adjacent blocks. VLFeat implements two HOG variants: the original one of Dalal-Triggs @cite{dalal05histograms} and the one proposed in Felzenszwalb et al. @cite{felzenszwalb09object}. In order to use HOG, start by creating a new HOG object, set the desired parameters, pass a (color or grayscale) image, and read off the results. @code VlHog * hog = vl_hog_new(VlHogVariantDalalTriggs, numOrientations, VL_FALSE) ; vl_hog_put_image(hog, image, height, width, numChannels, cellSize) ; hogWidth = vl_hog_get_width(hog) ; hogHeight = vl_hog_get_height(hog) ; hogDimenison = vl_hog_get_dimension(hog) ; hogArray = vl_malloc(hogWidth*hogHeight*hogDimension*sizeof(floa t)) ; vl_hog_extract(hog, hogArray) ; vl_hog_delete(hog) ; @endcode HOG is a feature array of the dimension returned by ::vl_hog_get_width, ::vl_hog_get_height, with each feature (histogram) having dimension ::vl_hog_get_dimension. The array is stored in row major order, with the slowest varying dimension beying the dimension indexing the histogram elements. The number of entreis in the histogram as well as their meaning depends on the HOG variant and is detailed later. However, it is usually
  • 65. 57 unnecessary to know such details. @ref hog.h provides support for creating an inconic representation of a HOG feature array: @code glyphSize = vl_hog_get_glyph_size(hog) ; imageHeight = glyphSize * hogArrayHeight ; imageWidth = glyphSize * hogArrayWidth ; image = vl_malloc(sizeof(float)*imageWidth*imageHeight) ; vl_hog_render(hog, image, hogArray) ; @endcode It is often convenient to mirror HOG features from left to right. This can be obtained by mirroring an array of HOG cells, but the content of each cell must also be rearranged. This can be done by the permutation obtaiend by ::vl_hog_get_permutation. Furthermore, @ref hog.h suppots computing HOG features not from images but from vector fields. <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> @section hog-tech Technical details <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> HOG divdes the input image into square cells of size @c cellSize, fitting as many cells as possible, filling the image domain from the upper-left corner down to the right one. For each row and column, the last cell is at least half contained in the image. More precisely, the number of cells obtained in this manner is: @code hogWidth = (width + cellSize/2) / cellSize ; hogHeight = (height + cellSize/2) / cellSize ; @endcode Then the image gradient @f$ nabla ell(x,y) @f$ is computed by using central difference (for colour image
  • 66. 58 the channel with the largest gradient at that pixel is used). The gradient @f$ nabla ell(x,y) @f$ is assigned to one of @c 2*numOrientations orientation in the range @f$ [0,2pi) @f$ (see @ref hog-conventions for details). Contributions are then accumulated by using bilinear interpolation to four neigbhour cells, as in @ref sift. This results in an histogram @f$h_d@f$ of dimension 2*numOrientations, called of @e directed orientations since it accounts for the direction as well as the orientation of the gradient. A second histogram @f$h_u@f$ of undirected orientations of half the size is obtained by folding @f$ h_d @f$ into two. Let a block of cell be a @f$ 2times 2 @f$ sub-array of cells. Let the norm of a block be the @f$ l^2 @f$ norm of the stacking of the respective unoriented histogram. Given a HOG cell, four normalisation factors are then obtained as the inverse of the norm of the four blocks that contain the cell. For the Dalal-Triggs variant, each histogram @f$ h_d @f$ is copied four times, normalised using the four different normalisation factors, the four vectors are stacked, saturated at 0.2, and finally stored as the descriptor of the cell. This results in a @c numOrientations * 4 dimensional cell descriptor. Blocks are visited from left to right and top to bottom when forming the final descriptor. For the UOCCTI descriptor, the same is done for both the undirected as well as the directed orientation histograms. This would yield a dimension of @c 4*(2+1)*numOrientations elements, but the resulting vector is projected down to @c (2+1)*numOrientations elements by averaging corresponding histogram dimensions. This
  • 67. 59 was shown to be an algebraic approximation of PCA for descriptors computed on natural images. In addition, for the UOCTTI variant the l1 norm of each of the four l2 normalised undirected histograms is computed and stored as additional four dimensions, for a total of @c 4+3*numOrientations dimensions. <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> @subsection hog-conventions Conventions <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ --> The orientation of a gradient is expressed as the angle it forms with the horizontal axis of the image. Angles are measured clock-wise (as the vertical image axis points downards), and the null angle corresponds to an horizontal vector pointing right. The quantized directed orientations are @f$ mathrm{k} pi / mathrm{numOrientations} @f$, where @c k is an index that varies in the ingeger range @f$ {0, dots, 2mathrm{numOrientations} - 1} @f$. Note that the orientations capture the orientation of the gradeint; image edges would be oriented at 90 degrees from these. **/ /* ----------------------------------------------------- ----------- */ /** @brief Create a new HOG object ** @param variant HOG descriptor variant. ** @param numOrientations number of distinguished orientations. ** @param transposed wether images are transposed (column major). ** @return the new HOG object. **
  • 68. 60 ** The function creates a new HOG object to extract descriptors of ** the prescribed @c variant. The angular resolution is set by ** @a numOrientations, which specifies the number of <em>undirected</em> ** orientations. The object can work with column major images ** by setting @a transposed to true. **/ VlHog * vl_hog_new (VlHogVariant variant, vl_size numOrientations, vl_bool transposed) { vl_index o, k ; VlHog * self = vl_calloc(1, sizeof(VlHog)) ; assert(numOrientations >= 1) ; self->variant = variant ; self->numOrientations = numOrientations ; self->glyphSize = 21 ; self->transposed = transposed ; self->useBilinearOrientationAssigment = VL_FALSE ; self->orientationX = vl_malloc(sizeof(float) * self->numOrientations) ; self->orientationY = vl_malloc(sizeof(float) * self->numOrientations) ; /* Create a vector along the center of each orientation bin. These are used to map gradients to bins. If the image is transposed, then this can be adjusted here by swapping X and Y in these vectors. */ for(o = 0 ; o < (signed)self->numOrientations ; + +o) { double angle = o * VL_PI / self- >numOrientations ; if (!self->transposed) { self->orientationX[o] = (float) cos(angle) ; self->orientationY[o] = (float) sin(angle) ; } else { self->orientationX[o] = (float) sin(angle) ; self->orientationY[o] = (float) cos(angle) ; }
  • 69. 61 } /* If the number of orientation is equal to 9, one gets: Uoccti:: 18 directed orientations + 9 undirected orientations + 4 texture DalalTriggs:: 9 undirected orientations x 4 blocks. */ switch (self->variant) { case VlHogVariantUoctti: self->dimension = 3*self->numOrientations + 4 ; break ; case VlHogVariantDalalTriggs: self->dimension = 4*self->numOrientations ; break ; default: assert(0) ; } /* A permutation specifies how to permute elements in a HOG descriptor to flip it horizontally. Since the first orientation of index 0 points to the right, this must be swapped with orientation self->numOrientation that points to the left (for the directed case, and to itself for the undirected one). */ self->permutation = vl_malloc(self->dimension * sizeof(vl_index)) ; switch (self->variant) { case VlHogVariantUoctti: for(o = 0 ; o < (signed)self->numOrientations ; ++o) { vl_index op = self->numOrientations - o ; self->permutation[o] = op ; self->permutation[o + self->numOrientations] = (op + self->numOrientations) % (2*self- >numOrientations) ; self->permutation[o + 2*self- >numOrientations] = (op % self->numOrientations) + 2*self->numOrientations ; } for (k = 0 ; k < 4 ; ++k) {
  • 70. 62 /* The texture features correspond to four displaced block around a cell. These permute with a lr flip as for DalalTriggs. */ vl_index blockx = k % 2 ; vl_index blocky = k / 2 ; vl_index q = (1 - blockx) + blocky * 2 ; self->permutation[k + self->numOrientations * 3] = q + self->numOrientations * 3 ; } break ; case VlHogVariantDalalTriggs: for(k = 0 ; k < 4 ; ++k) { /* Find the corresponding block. Blocks are listed in order 1,2,3,4,... from left to right and top to bottom */ vl_index blockx = k % 2 ; vl_index blocky = k / 2 ; vl_index q = (1 - blockx) + blocky * 2 ; for(o = 0 ; o < (signed)self->numOrientations ; ++o) { vl_index op = self->numOrientations - o ; self->permutation[o + k*self- >numOrientations] = (op % self->numOrientations) + q*self->numOrientations ; } } break ; default: assert(0) ; } /* Create glyphs for representing the HOG features/ filters. The glyphs are simple bars, oriented orthogonally to the gradients to represent image edges. If the object is configured to work on transposed image, the glyphs images are also stored in column-major. */ self->glyphs = vl_calloc(self->glyphSize * self- >glyphSize * self->numOrientations, sizeof(float)) ; _61 #define atglyph(x,y,k) self->glyphs[(x) + self- >glyphSize * (y) + self->glyphSize * self->glyphSize * (k)]
  • 71. 63 for (o = 0 ; o < (signed)self->numOrientations ; + +o) { double angle = fmod(o * VL_PI / self- >numOrientations + VL_PI/2, VL_PI) ; double x2 = self->glyphSize * cos(angle) / 2 ; double y2 = self->glyphSize * sin(angle) / 2 ; if (angle <= VL_PI / 4 || angle >= VL_PI * 3 / 4) { /* along horizontal direction */ double slope = y2 / x2 ; double offset = (1 - slope) * (self->glyphSize - 1) / 2 ; vl_index skip = (1 - fabs(cos(angle))) / 2 * self->glyphSize ; vl_index i, j ; for (i = skip ; i < (signed)self->glyphSize - skip ; ++i) { j = vl_round_d(slope * i + offset) ; if (! self->transposed) { atglyph(i,j,o) = 1 ; } else { atglyph(j,i,o) = 1 ; } } } else { /* along vertical direction */ double slope = x2 / y2 ; double offset = (1 - slope) * (self->glyphSize - 1) / 2 ; vl_index skip = (1 - sin(angle)) / 2 * self- >glyphSize ; vl_index i, j ; for (j = skip ; j < (signed)self->glyphSize - skip; ++j) { i = vl_round_d(slope * j + offset) ; if (! self->transposed) { atglyph(i,j,o) = 1 ; } else { atglyph(j,i,o) = 1 ; } } } } return self ; } /*
  • 72. 64 ----------------------------------------------------- ----------- */ /** @brief Delete a HOG object ** @param self HOG object to delete. **/ void vl_hog_delete (VlHog * self) { if (self->orientationX) { vl_free(self->orientationX) ; self->orientationX = NULL ; } if (self->orientationY) { vl_free(self->orientationY) ; self->orientationY = NULL ; } if (self->glyphs) { vl_free(self->glyphs) ; self->glyphs = NULL ; } if (self->permutation) { vl_free(self->permutation) ; self->permutation = NULL ; } if (self->hog) { vl_free(self->hog) ; self->hog = NULL ; } if (self->hogNorm) { vl_free(self->hogNorm) ; self->hogNorm = NULL ; } vl_free(self) ; } /* ----------------------------------------------------- ----------- */ /** @brief Get HOG glyph size ** @param self HOG object. ** @return size (height and width) of a glyph. **/ vl_size vl_hog_get_glyph_size (VlHog const * self) { return self->glyphSize ; }
  • 73. 65 /* ----------------------------------------------------- ----------- */ /** @brief Get HOG left-right flip permutation ** @param self HOG object. ** @return left-right permutation. ** ** The function returns a pointer to an array @c permutation of ::vl_hog_get_dimension ** elements. Given a HOG descriptor (for a cell) @c hog, which is also ** a vector of ::vl_hog_get_dimension elements, the ** descriptor obtained for the same image flipped horizotnally is ** given by <code>flippedHog[i] = hog[permutation[i]]</code>. **/ vl_index const * vl_hog_get_permutation (VlHog const * self) { return self->permutation ; } /* ----------------------------------------------------- ----------- */ /** @brief Turn bilinear interpolation of assignments on or off ** @param self HOG object. ** @param x @c true if orientations should be assigned with bilinear interpolation. **/ void vl_hog_set_use_bilinear_orientation_assignments (VlHog * self, vl_bool x) { self->useBilinearOrientationAssigment = x ; } /** @brief Tell whether assignments use bilinear interpolation or not ** @param self HOG object. ** @return @c true if orientations are be assigned with bilinear interpolation. **/ vl_bool vl_hog_get_use_bilinear_orientation_assignments (VlHog const * self) { return self->useBilinearOrientationAssigment ;
  • 74. 66 } /* ----------------------------------------------------- ----------- */ /** @brief Render a HOG descriptor to a glyph image ** @param self HOG object. ** @param image glyph image (output). ** @param descriptor HOG descriptor. ** @param width HOG descriptor width. ** @param height HOG descriptor height. ** ** The function renders the HOG descriptor or filter ** @a descriptor as an image (for visualization) and stores the result in ** the buffer @a image. This buffer ** must be an array of dimensions @c width*glyphSize ** by @c height*glyphSize elements, where @c glyphSize is ** obtained from ::vl_hog_get_glyph_size and is the size in pixels ** of the image element used to represent the descriptor of one ** HOG cell. **/ void vl_hog_render (VlHog const * self, float * image, float const * descriptor, vl_size width, vl_size height) { vl_index x, y, k, cx, cy ; vl_size hogStride = width * height ; assert(self) ; assert(image) ; assert(descriptor) ; assert(width > 0) ; assert(height > 0) ; for (y = 0 ; y < (signed)height ; ++y) { for (x = 0 ; x < (signed)width ; ++x) { float minWeight = 0 ; float maxWeight = 0 ; for (k = 0 ; k < (signed)self- >numOrientations ; ++k) { float weight ; float const * glyph = self->glyphs + k *
  • 75. 67 (self->glyphSize*self->glyphSize) ; float * glyphImage = image + self->glyphSize * x + y * width * (self->glyphSize*self->glyphSize) ; switch (self->variant) { case VlHogVariantUoctti: weight = descriptor[k * hogStride] + descriptor[(k + self->numOrientations) * hogStride] + descriptor[(k + 2 * self- >numOrientations) * hogStride] ; break ; case VlHogVariantDalalTriggs: weight = descriptor[k * hogStride] + descriptor[(k + self->numOrientations) * hogStride] + descriptor[(k + 2 * self- >numOrientations) * hogStride] + descriptor[(k + 3 * self- >numOrientations) * hogStride] ; break ; default: abort() ; } maxWeight = VL_MAX(weight, maxWeight) ; minWeight = VL_MIN(weight, minWeight); for (cy = 0 ; cy < (signed)self->glyphSize ; ++cy) { for (cx = 0 ; cx < (signed)self- >glyphSize ; ++cx) { *glyphImage++ += weight * (*glyph++) ; } glyphImage += (width - 1) * self->glyphSize ; } } /* next orientation */ { float * glyphImage = image + self->glyphSize * x + y * width * (self->glyphSize*self->glyphSize) ; for (cy = 0 ; cy < (signed)self->glyphSize ; ++cy) { for (cx = 0 ; cx < (signed)self- >glyphSize ; ++cx) { float value = *glyphImage ; *glyphImage++ = VL_MAX(minWeight,