VEHICLE DETECTION, CLASSIFICATION, COUNTING, AND DETECTION OF VEHICLE DIRECTI...
proceedings of PSG NCIICT
1. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
235
VEHICLE TRACKING AND RECOGNITION BY OPTICAL FLOW ESTIMATION
DCT AND DWT USING SVM
Dr. M. BALASUBRAMANIAN#1
Mr. S. RAKESH#2
Assistant Professor M.E Student
Department of CSE Department of CSE
Annamalai University, Annamalai University
Chidambaram,India Chidambaram,India
E-mail:balu_june1@yahoo.co.in#1
rakesh608@gmail.com#2
ABSTRACT: This Paper proposes tracking
vehicles and recognizing vehicles using optical
flow estimation, and discrete cosine transform
using SVM. The objective of this work is to track
on-road vehicle and recognizing the vehicles on-
road. Tracking and recognizing, of motion can be
informative and valuable, and motion information
is very useful for traffic surveillance system.
Many proposed motion tracking and recognizing
vehicle techniques involves using template
matching, blob tracking and contour tracking. A
famous motion tracking and estimation technique,
optical flow, however, is not being widely used
and tested for the practicability on the traffic
surveillance system. Show that this framework
yields robust efficient on-board vehicle tracking
system with high precision, high recall, and good
localization. The results of tracking using Lucas-
Kanade (LK) optical flow is proving that optical
flow is a great technique to track the motion of
moving objects, and has great potential to
implement it into the traffic surveillance system.
From tracking by extraction of DCT recognizing
the vehicles on the road . This proposed method
gives tracking rate of 96.0%.
Keywords: Opticalflowestimation, motion
vector, Thresholding, morphology, DCT, SVM.
1. INTRODUCTION
To Recognize the vehicles like cars, van,
jeep, bus etc., in vehicle tracking and recognition.
This method is used for real time surveillance of
tracking and recognizing the vehicles. And is also
useful to prevent automotive accidents in world
wide. Worldwide automotive accidents injure
between 20 and 50 million people each year, and at
least 1.2 million people die as a result of them.
Between 1% and 3% of the world’s domestic
product is spent on medical care, property damage,
And other costs that are associated with auto
accidents. As a result, over the years, there has
been great interest in the development of active
safety systems among vehicle manufacturers,
safety experts, and academics. The vehicle-
recognition system has been learned in two
iterations using the active learning technique of
selective sampling to query informative examples
for retraining. Using active learning yields a
significant drop in false positives per frame and
false-detection rates, while maintaining a high
vehicle-recognition rate. The robust on-road
vehicle-recognition system is then integrated with a
condensation particle filter, which is extended to
multiple-vehicle tracking, to build complete vehicle
recognition and tracking system. Vehicle tracking
and recognition is one of the popular topics in
computer vision field. It is also considered as one
of the challenges in the computer vision field, as
motion tracking and recognition can be quite tricky.
Unlike human, computer does not possess self-
learning ability, thus also make them unable to
track the motion as “smart” as humans can. To
overcome this, many researches have been done,
and solution has been made. These solutions may
not yet practicably, but still enough to prove that it
is possible for the tell computer how to track the
motions.
2. RELATED WORK
Increasing congestion on freeways and
problems associated with existing detectors has
spawned an interest in new vehicle detection
technologies such as video image processing.
Existing commercial image processing systems
work well in free-owing traffic, but the systems
have difficulties with congestion, shadows and
lighting transitions. These problems stem from
vehicles partially occluding one another and the
fact that vehicles appear differently under various
conditions. By developing a feature based tracking
2. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
236
system for detecting vehicles under these
challenging conditions. Instead of tracking entire
vehicles, vehicle features are tracked to make the
system robust to partial occlusion. The system is
fully functional under changing lighting conditions
because the most salient features at the given
moment are tracked. After the features exit the
tracking region, they are grouped into discrete
vehicles using a common motion constraint. The
groups represent individual vehicle trajectories
which can be used to measure traditional traffic
parameters as well as metrics suitable for improved
automated surveillance. Ease of reference of this
paper describes the issues associated with feature
based tracking, presents the real-time
implementation of a prototype system, and the
performance of the system on a large data set [8] .
Developing on-board automotive driver assistance
systems aiming to alert drivers about driving
environments, and possible collision with other
vehicles has attracted a lot of attention lately. In
these systems, robust and reliable vehicle detection
is a critical step. This paper presents a review of
recent vision-based on-road vehicle detection
systems. Our focus is on systems where the camera
is mounted on the vehicle rather than being fixed
such as in traffic/driveway monitoring systems.
First, we discuss the problem of on-road vehicle
detection using optical sensors followed by a brief
review of intelligent vehicle research worldwide.
Then, we discuss active and passive sensors to set
the stage for vision-based vehicle detection.
Methods aiming to quickly hypothesize the location
of vehicles in an image as well as to verify the
hypothesized locations are reviewed next.
Integrating detection with tracking is also reviewed
to illustrate the benefits of exploiting temporal
continuity for vehicle detection. Finally, we present
a critical overview of the methods discussed, we
assess their potential for future deployment, and we
present directions for future research.
3. FEATURE EXTRACTION OF VEHICLE
TRACKING
3.1 OPTICAL FLOW ESTIMATION [7]
The Optical flow method detects the
motion by analyzing the motion of pixels, either in
bulk or in individual. It reflects the image changes
due to motion during a time interval data.
The idea of optical flow can be said
derived from how human observe motion and
Objects. Human able to determine motion in the
scene captured by analyzing gradient, Brightness,
lights reflected and etc. [Lee and Kalmus, 1980]
[8]. However optical flow Computations do not
provide functionality to automatically classify and
recognize Moving objects, though it can be done
after the optical flow computation has been done to
Observe what object is moving and probably will
move (estimation).
Optical flow has two main methods for tracking,
which are dense optical flow, And sparse optical
flow. Optical flow or optic flow is the pattern of
apparent motion of objects, surfaces, and edges in
a visual scene caused by the relative motion
between an observer (an eye or a camera) and the
scene. Optical flow techniques such as motion
detection, object segmentation, time-to-collision
and focus of expansion calculations, motion
compensated encoding, and stereo disparity
measurement utilize this motion of the objects'
surfaces and edges. Estimation of the optical flow
is the Sequences of ordering images allow the
estimation of motion as either instantaneous image
velocities or discrete image displacements. Fleet
and Weiss provide a tutorial introduction to
gradient based optical flow. It emphasizes the
accuracy and density of measurements. The
optical flow methods try to calculate the motion
between two image frames which are taken at times
t and t+λt at every voxel position. These methods
are called differently since they are based on local
Taylor series approximations of the image signal;
that is, they use partial derivatives with respect to
the spatial and temporal coordinates. The software
structure of video surveillance system is not always
consistent Still, there are some major components
or function that is commonly seen and used. Many
proposed solutions involve the combination of
background and foreground subtraction and the
template matching technique to keep track of the
vehicles. The background subtraction is used to
extract out the foreground object, and the template
matching technique is used to find the object,
which is vehicle in vehicle tracking case, and
manage to track the object in the next frame. The
drawback of such combination method is template
matching will require frequent update on template,
and changing of the view of the angle of the camera
will require adjustment also. The adaptability can
be varied. To deal with such limitation, this paper
is focused on assessment of an alternative motion
tracking method, optical flow method, which is
believed to have a better adaptability level
compared with template matching and tracking.
3. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
237
Fig 3.1:Block Diagram of Vehicle Tracking
System Component and Process flow.
Motion detection and tracking process is
also consists of many sub components and
function. Different approaches of motion detection
and tracking may have different component design.
The most frequently used approach to detect
motion in image sequences is background
subtraction. In this paper, now specify the motion
detection technique will be used. Instead, any
possible solution available in the Open CV library
that is capable of achieving the main objectives,
which is to track the vehicle, is tested and evaluate.
3.2 ADVANCED BACKGROUND
SUBTRACTION [7]
The drawback of absolute frame
differencing is background must remain static over
time. Is it just simply compares the current frame
with the previous frame, the system itself actually
do not know any background model. In reality,
many background scenes contain complicated
moving things, such as trees waving also results in
pixel changing, which will greatly affect the
accuracy of absolute frame differencing (Bradski
and Kaehler, 2008). Providing the background
model is very helpful for producing better
background subtraction results.
C) Background Subtraction
By providing a good background model,
even when the object is not moving, it still can be
detected by subtracting (comparing) every pixel on
current frame with the background model provided.
3.3 CHANGING RGB TO INTESITY
Tracking vehicle using optical flow
estimation, this process uses system objects to
illustrate how to detect vehicles in a video sequence
using optical flow estimation. The paper uses
optical flow estimation technique to estimate the
motion vector in each frame of the video sequence.
By Thresholding and morphological closing on the
motion vectors, the binary features of image
produced. Then you draw a rectangle around the
cars that pass beneath a reference line. By using the
counter to track the number of cars in the region of
interest.
Creating the system objects to read a video
from the video file and by creating a system object
to UN sample the chrominance components of
video. Dense optical flow suggested that every
pixel in an image should be associated with a
velocity value, and compute them to get the flow of
motion. Dense optical flow can provide visualize
the form of the motion field. The appearance of the
object is detected and calculated, showing the
motion field of the scene. However this can be very
resource hogging, and harden the process to extract
out the certain object motion (if the whole image is
actually moving). Sparse optical flow will only
tracking on points of interest (or subset of points),
thus able to produce robust, reliable and direct
informative result.
VIDEO CAPTURE
RGB TO INTENITY
MOTION VECTOR
THRESHOLDING IMAGE
VEHICLE TRACKING
VEHICLE RECOGNITION
4. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
238
Fig 3.3 RGB to Intensity Image
By this way optical flow has been estimated.
RGB to Intensity is calculated using
Y=0. 299R+0.587G+0.114B
3.4 EXTRACTING THRESHOLDING
IMAGES FROM ORIGINAL IMAGE
From the original video converting RGB
to intensity from that optical flow motion vector
sequence, while in the motion vector sequence
transferring into thresholding and region filtering
by means of velocity and mean. In the
Thresholding region filtering the image from the
video sequence, by means of men and the velocity
of the vehicle and its displayed in the morphology
sector. While the vehicle moving point is displayed
as white and other part it contains black in color.
Fig 3.4 Thresholding Image
Background subtraction rarely gives clean
output. Most of the time, background subtraction
will make some error identification, and leaving
5.3 IMAGE MORPHOLOGY [8]
Some small area of the pixel which is
wrongly identified as a foreground object. “Tiny
object” in most situations, is merely just mistakes
(or error) of background subtraction. In order to
eliminate them, CV Erode can be used to perform
erosion on image passed in. Erosion is an image
morphology technique that will reduce the object
region (or size of the object). Parameter source act
as the input to the function, and dust is the output
binary image after erosion processing is complete.
B will be a 3-by-3 kernel with anchor in its center
if a NULL value is passed into the function.
Parameter iterations will determine how many
times the function will repeat in a single call. The
larger the value of iterations is, the more regions
will get reduced.
3.5 EXTRACTING TRACKING IMAGE
FROM ORIGINAL IMAGE
From the original video, where the camera has
been fixed at the tracking point , while all the
vehicles pass in that particular region automatically
vehicles have tracked at that point , where tracked
vehicles has bounded by the rectangle box.
4. FEATURE EXTRACTION OF VEHICLE
RECOGNITION
4.1 DISCRETE COSINE TRANSFORM
(DCT) [11]
A discrete cosine transform (DCT)
expresses a sequence of finite data points in terms
of a sum of cosine functions oscillating at different
frequencies. The DCT is a mathematical operation
that transforms a set of data, which is sampled at a
given sampling rate, to its frequency components.
The number of samples should be finite, and power
of two for optimal computation time. The DCT is
closely related to the discrete Fourier transform.
The DCT, however, has better energy compaction
properties, with just a few of the transform
coefficients representing the majority of the energy
in the sequence. The energy compaction properties
of the DCT make it useful in applications requiring
data reduction. Each of these samples of the
original image is mapped to the frequency domain.
A one-dimensional DCT converts an array
of numbers, which represent signal amplitudes at
various points in time, into another array of
numbers, each of which represents the amplitude of
a certain frequency components from the original
array. The resultant array contains the same number
of values as the original array. The first element in
the result array is a simple average of all the
samples in the input array and is referred to as DC
5. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
239
coefficient. The remaining elements in the resulting
array indicate the amplitude of a specific frequency
component of the input array, and are known as AC
coefficients. The frequency content of the sample
set at each frequency is calculated by taking a
weighted average of the entire set as shown in Fig. .
These weight coefficients are like a cosine wave,
whose frequency is proportional to the resultant
Array index as shown.
FORMULAE OF THE 2-D DCT:
WHERE,
4.1.1 DISCRETE WAVELET
TRANSFORM (DWT) [11]
For many signals, the low-frequency
content is the most important part. It is what gives
the signal its identity. The high-frequency content,
on the other hand, imparts flavour or nuance.
Consider the human voice. If you remove the high-
frequency components, the voice sounds different,
but you can still tell what’s being said. However, if
you remove enough of the low-frequency
components, you hear gibberish. In wavelet
analysis, we often speak of approximations and
details. The approximations are the high scale, low-
frequency components of the signal. The details are
the low-scale, high-frequency components. The
filtering or decomposition process is shown in Fig.
4.1.1.
Lo D and Hi D are low pass and high pass
decomposition filters, respectively. 2 ↓ 1or 1 ↓ 2
represents down sampling by 2. cA and cD are the
approximation and detail coefficients.
Fig. 4.1.1: Two-dimensional wavelet
decomposition
The discrete wavelet transform (DWT) is
a linear transformation that operates on a data
vector whose length is an integer power of two,
transforming it into a numerically different vector
of the same length. It is a tool that separates data
into different frequency components, and then
studies each component with resolution matched to
its scale.
Discrete wavelet transform will be divided
into four bands at each of the transform level. The
first band represent input image filtered with a low
pass filter and compressed to half. This band is also
called ‘approximation’. The other three bands are
called ‘details’ where high pass filter is applied.
These bands contain directional characteristics. The
size of each of the bands is also compressed to half.
Specifically, the second band contain vertical
characteristics, the third band shows characteristics
in the horizontal direction and the last band
represents diagonal characteristics of the input
images. Conceptually, discrete wavelet is very
simple because it is constructed from a square
wave. Moreover, discrete wavelet computation is
fast since it only contains two coefficients and it
does not need a temporary array for multi- level
transformation. Thus, each pixel in an image that
will go through the wavelet transform computation
will be used only once and no pixel overlapping
during the computation.
The DWT provides high time resolution
and low frequency resolution for high frequencies
and high frequency resolution and low time
resolution for low frequencies. In that respect it is
similar to the human ear which exhibits similar
time-frequency resolution characteristics. The
Discrete Wavelet Transform (DWT) is a special
case of the WT that provides a compact
representation of a signal in time and frequency
that can be computed efficiently.
6. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
240
4.2 SUPPORT VECTOR MACHINE (SVM)
[12]
Support vector machine is based on the
principle of structural risk minimization (SRM).
Support vector machines can be used for the pattern
classification and nonlinear regression. SVM
constructs a linear model for decision function
using non-linear class boundaries based on support
vectors. SVM trains linear machines for an optimal
hyper plane that separates the data without error
and into the maximum distance between the hyper
plane and the closest training points. The training
points that are closest to the optimal separating
hyper plane are called support vectors. Support
vector machine (SVM) ban be used for classifying
the obtained data (Burges, 1998). SVM is a set of
related supervised learning methods used for
classification and regression. They belong to a
family of generalized linear classifiers Let us
denote a feature vector (termed as pattern) by (x1,
x2, x3……, Xian) and its class label by y such that y =
{+1, −1}. Therefore, consider the problem of
separating the set of n-training patterns belonging
to two classes. The Architecture of SVM is Shown
in Fig 4.2..
Fig 4.2 The architecture of the SVM (NS is the
number of support vectors).
The support vectors are the transformed) training
patterns. The support vectors are (equal) close to
hyper plane. The support vector is training samples
that define the optimal separating hyper plane and
are the most difficult patterns to classify.
Informally speaking, they are the patterns most
informative for the classification task.
(a) Nonlinear problem. (b) Linear problem.
An example for SVM kernel function maps two
dimensional input space to higher three
dimensional feature space.
4.3 VEHICLE RECOGNIZED FROM
TRACKED IMAGE.
For training, 20 highest DCT Coefficients
are extracted from each tracked vehicle. And given
as input to SVM Model. For testing, 20 highest
DCT Coefficients are extracted from testing
tracked vehicle and given as input to SVM Model.
The SVM model recognizes the vehicle name. The
final tracked and recognized vehicle is shown in the
fig 4.3.1 and 4.3.2
Fig.4.3.1:Tracking point Fig 4.3.2:Vehicle
recognition
5. EXPERIMENTAL RESULTS
This video is captured from highways. In
highway traffic video was shot. From this traffic
video, the optical flow estimation is implemented
to estimate the motion vector, threshold region
filtering and final tackling result and Recognition
using DCT.
Fig.5.1: Original Video
The video captured in VIP traffic video
From this video is being tracked using optical
7. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
241
flow estimation. As motion vector, threshold
and region filtering and the final result is
shown below.
Fig.5.2: Motion vector
From the traffic video the motion vector is
applied in the car The vehicle is bounded by
the motion vector region is being placed.
Fig.5.3: Threshold video
This is a thresholding and region filtering, by
calculating the intensity values. Where the
vehicles presents the white scattered on the
object, then remaining place will black (gray)
will occur. Threshold video shown in the Fig
5.3.
Fig.5.4 Tracking point of the vehicle
This is a tracking point sector where the
vehicles were positioned. Once the vehicles
were tracked , the rectangle box is bounded on
the vehicle. Tracked vehicle is shown in the
Fig 5.4
Fig.5.5: recognition from tracking result
From the tracking point of the video
Recognition has been extracted by DCT
using SVM. Finally the tracking vehicle
extracted and recognized as a BMW VIP
car. Recognized vehicle is shown in the
Fig 5.5.
Fig.5.6: Final Result
And this the final result of the car tracking
using optical flow estimation, The
bounded line is presented top of the road
is the tracking point , by this point only
the car has been tracked by bounded by a
rectangle box. After the rectangle box
bounded in the vehicle. Extraction of DCT
recognizing the vehicle as a BMW VIP
car. All final results of the vehicle
Tracking And Recognized Vehicle Figures
are shown below.
Fig 5.6.1: original video Fig 5.6.2: Motion
vector
Fig.5.6.3 Thresholding and filtering region
8. Proceedings of the Sixth National Conference on Innovations in Information & Communication Technology
(NCIICT - 2013)
242
Fig.5.6.4:Tracking point Fig 5.6.5:Vehicle
recognition
5.7 The performance of vehicle recognition of
using SVM is given in the table 5.7.
SVM kernel
Function
Features Recognition Rate
Gaussian
DCT
90%
Polynomial 80%
Sigmoidal 82%
Gaussian
DWT
94%
Polynomial 88%
Sigmoidal 84%
Table 5.7 Performance of vehicle Recognition as
shown in the table.
6. CONCLUSION
From the results explained in the chapter 6, vehicle
tracking and recognition is done by optical flow
estimation and DCT uses SVM. In this paper, an
image processing technique is presented, designed
for detecting, tracking and recognition vehicles on
the road. It detected all the vehicles which are
presented in the video. The feature extraction
method like optical flow estimation and DCT is
involved in extracting the traffic video.
REFERENCES
1. D. Alonso, L. Salgado, “A real-time computer
vision system for vehicle tracking on road
surveillance” in IEEE conf. On computer vision
and vehicle tracking 8/10/2011.
2. Zehang Sun, Member, IEEE, George Bebis,
Member, IEEE, and Ronald Miller “On Road
Vehicle Detection: A Review” VOL. 28, NO. 5,
MAY 2010.
3. Matthew J. Leotta, Member, IEEE, and Joseph L.
Mundy “Vehicle Surveillance with a Generic
Adaptive, 3D Vehicle Model” VOL. 33, NO. 7,
JULY 2011.
4. Hossein Tehrani Niknejad, Akihiro Takeuchi,
Seiichi Mita, Member, IEEE, and “On-Road
Multivehicle Tracking Using Deformable Object
Model and Particle
Filter” IEEE VOL.13, NO. 2, JUNE 2012.
5. Fatemeh Karimi Nejadasl, Ben G.H. Gate, Serge P.
Hoogendoorn “Optical flow based vehicle tracking
strengthened by statistical decisions” ELSEVIER
VOL.12, NO.2 MAY 2010.
6. M.J. Leotta, “Generic, Deformable Models for 3-D
Vehicle Surveillance,” PhD
dissertation, Brown Univ., May 2010
7. David J. Fleet, Yair Weiss “Optical Flow
Estimation” 7 (2): 95–117, January 2008.
8. Ong Hen Ching “ OPTICAL FLOW-BASED
VEHICLE DETECTION” Faculty of Information
and Communication Technology (Perak Campus)
May 2010
9. R. T., Lipton, A. J., Wixson, L., Kanade,
T.Fujiyoshi, H. Duggins, D., et al (2000). A System
for Video Surveillance and Monitoring, Final
Report, Robotics Institute, Carnegie Mellon
University
10. Delores M. Etter, Engineering problem solving
with Matlab, Pearson Education, India, 2005.
11.Dr. S. Jothilakshmi, Basics of speech processing,
“National Workshop on pattern classification
techniques for audio and video processing”.
Annamalai University, India, 2009.
12.Dr. M. Balasubramaian, Support vector machines,
“National Workshop on pattern classification
techniques for audio and video processing”.
Annamalai University, India, 2009.