SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1801
www.ijarcet.org

Abstract— A large number of researchers are
attracted by video genre classification, video contents retrieval
and semantics research in video processing and analysis domain.
Many researchers try to propose structure or frameworks to
classify the video genre that’s integrating many algorithms using
low and high level features. Features generally include both
useful and useless information that are difficult to separate. In
this paper, video genre classification is proposed by using only
the audio channel. A decomposition model is based on
multivariate adaptive regression splines to separate useful and
useless components and perform the genre identification is
performed on these low-level acoustic features such as MFCC
and timbral textual features. Experiments are conducted on a
corpus composed from cartoons, sports, news, dahmas and
musics on which obtain overall classification rate of 91.83%.
Index Terms— Multivariate Adaptive Regression Splines ,
Mel Frequency Cepstral Coefficients, Factor Analysis
I. INTRODUCTION
Today, efficient tools are need for users to crawl the large
collection because the available video amount has enlarged
significantly on the Internet. While most of the research on
video classification has the intent of classifying an entire
video, some authors have focused on classifying segments of
video such as identifying violent [1] or scary [2] scenes in a
movie or distinguishing between different news segments
within an entire news broadcast [3]. From this reason, many
works are forced on structuring audiovisual databases by
content analysis, based on text-based categorization [3]. For
the purpose of video classification, features are drawn from
three modalities: text, audio, and visual. Most of the proposed
approaches rely on image analysis. In [4], many works are
motivated by the critical need of efficient tools for structuring
audiovisual databases these last years. In [2], they investigate
higher level analysis like tracking of audiovisual events.
Audio-based approaches were explored by automatic
transcription of speech contents, or by low level audio stream
analysis. However, these systems generally have poor
performances on unexpected linguistic domains and in
Manuscript received May, 2013.
Hnin Ei Latt is with the Faculty of Information and Communication
Technology, University of Technology (Yatanarpon Cyber City), Pyin Oo
Lwin, Myanmar
Dr. Nu War was with the Faculty of Information and Communication
Technology, University of Technology (Yatanarpon Cyber City), Pyin Oo
Lwin, Myanmar .
adverse acoustic conditions.
Acoustic-space characterization is presented by using statistic
classifier like gaussian mixture model (GMM), neural nets or
support vector machines (SVM) on cepstral domain features
[5, 6, 7]. Various kinds of acoustic features have been
evaluated in the field of video genre identification. In [8, 7, 5],
time-domain audio features are proposed like zero crossing
rates or energy distributions. Therefore, low-level approaches
present a better robustness to the highly variable and
unexpected conditions that may be encountered on videos. In
the cepstral domain, one of the main difficulties in genre
identification is due to the diversity of the acoustic patterns
that may be produced by each video genre. In this paper, this
problem is aim to address in the field of identifying video
genres by applying multivariate adaptive regression splines.
Video genre classification framework is focused on by using
an audio-only method.
In the next section an overview of the presented system is
provided first. The architecture of the system and the basic
underlying concepts are explained. Secondly, the multivariate
adaptive regression splines algorithm is described. Finally,
the experimental results are also shown.
II. SYSTEM ARCHITECTURE
VideoFiles
ExtractAudio
Files
Feature
Extraction
Mel_Freq_Cepstral
_Coefficient
News
MARSModel
Sports DahmasMusics
ZeroCrossingRate ShortTimeEnergy SpectralFlux SpectralCentroid Spectralrolloff Noise_Frame_Ratio Silence_Ratio
Cartoons
MARSModelMARSModelMARSModelMARSModel
Fig.1. Overview of system architecture
The overall procedure to extract audio file from an video
clip has shown in figure.1. Firstly, base audio features are
Audio-based Classification of Video Genre
Using Multivariate Adaptive Regression Splines
Hnin Ei Latt, Nu War
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
www.ijarcet.org
1802
extracted from the audio signal. The MFCC, zero crossing
rate, short time energy, spectral flux, spectral centroid,
spectral rolloff, noise frame ratio and silence ratio are used as
the base audio features in this paper. And then, multivariate
adaptive regression splines develop the model for each genre
types by using the base audio features set. The next step is an
efficient mechanism for classifying genre in the database and
measuring their performance. The details of the proposed
audio fingerprint are explained in figure.
III. FEATURE EXTRACTION
Many of the audio-based features are chosen to
approximate the human perception of sound. In this frame
work uses low-level acoustic features that are both
time-domain features and frequency-domain features. The
timbral textual features are calculated from the given audio
signal. Timbral textual features are those used to differentiate
mixture of sounds based on their instrumental compositions
when the melody and the pitch components are similar. The
use of timbral textural features originates from speech
recognition. Extracting timbral features require preprocessing
of the sound signals. The signals are divided into statistically
stationary frames, usually by applying a window function at
fixed intervals. The application of a window function removes
the so-called ―edge effects.‖ Popular window functions
including the Hamming window function. Short-Term Fourier
Transform Features: This is a set of features related to timbral
textures and is also captured using MFCC. It consists of
Spectral Centroid, Spectral Rolloff, Spectral Flux and Low
Energy, Zero Crossings and then computes the mean for all
five and the variance for all but zero crossings. So, there are a
total of nine features. In the time-domain, Zero crossing rate
(ZCR) is the number of signal amplitude sign changes in the
current frame. Higher frequencies result in higher zero
crossing rates. Speech normally has a higher variability of the
ZCR than in music. If the loudness and ZCR are both below
thresholds, then this frame may represent silence. The silence
ratio is the proportion of a frame with amplitude values below
some threshold. Speech normally has a higher silence ratio
than music. News has a higher silence ratio than commercials.
In the frequency-domain, the energy distribution (short time
energy) is the signal distribution across frequency
components. The frequency centroid, which approximates
brightness, is the midpoint of the spectral energy distribution
and provides a measure of where the frequency components
are concentrated. Normally brightness is higher in music than
in speech, whose frequency is normally below 7 kHz.
Bandwidth is a measure of the frequency range of a signal.
Some types of sounds have more narrow frequency ranges
than others. Speech typically has a lower bandwidth than
music. The fundamental frequency is the lowest frequency in
a sample and approximates pitch, which is a subjective
measure. Mel-frequency cepstral coefficients (MFCC) are
produced by taking the logarithm of the spectral components
and then placing them into bins based upon the Mel
frequency scale, which is perception-based.
IV. MULTIVARIATE ADAPTIVE REGRESSION SPLINES
Analyses were performed using multivariate adaptive
regression splines, a technique that uses piece-wise linear
segments to describe non-linear relationships between audio
features and video genre. The theory of multivariate adaptive
regression splines (MARS) was developed by Jerome
Friedman [9] in 1991. Let z be the dependent response, which
can be continuous or binary, and let
D)Y...,,(Y=Y n
n1  (1)
be the set of potential predictive covariates. Then the
system assume that the data are generated from an unknown
―true‖ model. In case of a continuous response this would be
enYYYfz  ),...,,( 21 (2)
The distribution of the error e is member of the exponential
family [1]. f is approximated by applying functions, which
include interactions of at most second order. That means that
use the model
eYYgYggYf
jj
jjjj
j
jj    21
212,1
1
110 ),()()(
(3)
whereas with error variance . Linear splines and their
tensor products are used to model the function g(.). A
one-dimensional spline can be written as

 
K
k
kk tybybbyg
1
01 )()(
(4)
and the knot tk in the range of the observed values of Y.
For this reason the function g is situated in a linear space with
the K + 2 basis functions. Thus the following model results:
00 g
,


M
i
j
j
i
j
ijj i
ii
YBYg
1
)()( 11

(5)


M
i
jj
jj
i
jj
ijjjj YYBYYg
1
, ),(),( 21
2121
2121

(6)
because the interaction gj1,j2 is modeled by means of tensor
product splines as
)()(),( 22112112 ygygyyg  (7)
The M represent the number of basis functions in the model
and the Bs represent spline basis functions as described above
and the βs are coefficients. In this approach the coefficients
are estimated by using the Least Squares method. Now the
coefficient matrix can be written as
.)(ˆ *1**
ZYYY TT 
 (8)
Y∗ is represented as the design matrix of the selected basis
functions, and Z represents the response vector. Instead of Yj,
MARS uses a collection of new predictors in the form of
piecewise linear basis functions are as
},..,{,,...,1},)(,){( 1 Njjjj yytnjYttY  
After that, the generalized cross-validation criterion is used
to measure the degree of fit or lack of accuracy of the model :
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1803
www.ijarcet.org
2
1
2
]
.
1[
)](ˆ[
1
)(
N
Md
yfz
N
MGCV
N
i
iMi




(9)
whereas denotes the fitted values of the current MARS
model and d denotes the penalizing parameter. The numerator
is the common residual sum of squares, which is penalized by
the denominator, which accounts for the increasing variance
in the case of increasing model complexity. A smaller d
generates a larger model with more basis functions, a larger d
creates a smaller model with less basis functions.
According to the table.1, Forward Process Stage is that the
stepwise addition process basis functions are added until the
maximal allowed model size is reached. The largest model
generally overfits the data. Then Backward pruning Stage -the
stepwise deletion process- is that all ‗unnecessary‘ basis
functions are removed again until a final model is obtained
which is best considering the GCV that is the one with the
minimum GCV. In the first step of the addition process a
constant model is fitted. Subsequently the number of
candidate basis functions depends on the number of possible
knots per predictor variable. To keep the procedure fast, the
results robust the number of possible knots per predictor and
also the possible candidates per step are limited. To determine
the number of potential knots of a specific covariate an order
statistic is computed and a subset of it is then chosen as
potential knots. Commonly these are about 20 knots per
predictor, at most every third value is chosen yet. In the first
iteration – after the fit of the constant model – a linear basis
function on one of the predictor variables is fitted. The second
iteration is accounted for both linear basis functions on
another covariate and basis functions with knots of the
covariate already in the model.
The model to choose in every step during the forward
process is the one out of all possible models which minimizes
the GCV. In the backward process one basis function is
deleted per step and the GCV is computed for the reduced
model. The model which yields the smallest increase of GCV
becomes the new one.
TABLE.1. ALGORITHM
Forward Process Stage:
 Keeping coefficients the same for variable existed in the current
model,
 Update basis functions with the updated side knots
 Add the new basis functions to the model and add the reflected
partner
 Select a new basis function pair that produces the largest decrease
in training error.
 Repeat the whole process until some termination condition is met:
if error is too small or
 if the number of model's coefficients in the next iteration is
expected to be less than number of input variables.
Backward Pruning Stage:
 Find the subset which gives the lowest Cross Validation error, or
GCV.
 Delete one basis function per step and reduce model
 Yield the smallest increase of GCV become the new one
V. EXPERIMENTAL RESULTS
The training data set includes the information about a test
related with news ,cartoon, sport, music, dama video
including 860 observations. Among of them, 140, 235, 176,
113, and 196 clips are ‗News‘, ‘Dahma‘, ‘Cartoon‘, ‘Sport‘
and ‗Music‘ respectively. The data of the dependent variable
are binary, and it investigated ―whether the sport or not‖.
Here, a 1 is interpreted as ―tested positive‖. This data sample
reveals 20 explanatory variables which are given as:Y1
toY13: MFCC, Y14: Zero Crossing Rate, Y15: Short Time
Energy, Y16: Spectral Flux, Y17: Spectral Centroid, Y18:
Spectral rolloff, Y19: Noise Frame Ratio, and Y20: Silence
Ratio.
Using this data set, the proposed framework built the model
for each five genre types as show in Figure 2. In model
building stage, the parameters is set that maximum number of
basis functions is 21 and maximum number of interactions is
2, interpolation type is piecewise-linear, penalty per knot is 3,
the best value can also be found using 5-fold
Cross-Validation. Larger values will lead to fewer knots being
placed
.
(a)Model for ―Dahmas‖ (b)Model for ―Music‖
(c)Model for ―Sport‖ (d)Model for ―Cartoon‖
(e)Model for ―News‖
Fig. 2. MARS Models for each genre
The variable selection results using each MARS model can
be summarized in Table 2. It is observed that MFCC3 and
MFCC12 do play important roles in deciding the MARS
Dahma models. For the MARS Music models, MFCC3 and
Noise Frame Rate do play major roles. For the MARS Sports
models, Noise Frame Rate, Silence Ratio and Spectral Flux
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
www.ijarcet.org
1804
are more important. Short time energy and MFCC3 are
important variable to decide the Cartoon Model. Noise Frame
Ratio, Short Time Energy, MFCC3 variables are more
important in deciding the New models. From these results,
MFCC3, MFCC12, Short Time Energy, Noise Frame Rate,
Silance Ratio and Spectral Flux are more useful features than
other features.
TABLE 1. DECOMPOSITION ANALYSIS OF EACH MODEL
Gener Fun
ctio
n
STD GCV basis para
ms
variabl
es
Dahma 1
2
3
4
5
0.261
0.137
0.060
0.050
0.113
0.144
0.080
0.056
0.043
0.058
1
2
2
1
2
2.5
5.0
5.0
2.5
5.0
3
12
20
2,20
3,12
Music 1
2
3
4
5
6
3.953
0.212
0.038
0.085
0.050
3.840
20.21
0.074
0.016
0.023
0.017
22.086
3
1
1
1
2
3
7.5
2.5
2.5
2.5
5.0
7.5
3
19
3,5
3,15
3,17
3,19
Sports 1
2
3
4
5
6
7
8
9
0.087
0.325
0.058
0.043
0.100
0.061
0.040
0.146
0.186
0.033
0.235
0.027
0.025
0.038
0.028
0.025
0.051
0.082
1
1
2
2
1
1
1
1
3
2.5
2.5
5.0
5.0
2.5
2.5
2.5
2.5
7.5
2
20
2,20
3,19
3,20
14,20
15,20
16,20
19,20
Cartoo
n
1
2
3
4
0.347
0.081
0.351
0.104
0.315
0.060
0.542
0.061
2
1
2
1
5.0
2.5
5.0
2.5
3
14
15
17
News 5
6
7
8
9
10
0.358
0.077
0.378
0.091
0.100
0.047
0.467
0.059
0.414
0.057
0.060
0.050
2
2
2
1
1
1
5.0
5.0
5.0
2.5
2.5
2.5
19
2,15
3,15
3,19
5,17
8,14
For this proposed system, a test set of 240 clips are created
to test the accuracy rate. Classification accuracy rate is 97
percent for ‗Music‘ genre which is best performance than
other genre. It can be seen with matlab program for our own
training database that includes 860 videos. As the point of
accuracy role, these algorithm has been proofed with true
positive rate, true negative rate, false positive rate, and false
negative with the own 240 videos database in matlab.
According to the Table 2, true positive rate are 48 percent and
56 percent for ‗News‘ and ‗Sport‘ respectively. False positive
rate are also reported 54 percent and, 43 percent for ‗News‘
and ‗Sport‘ respectively. True negative rate are shown over
93 percent for each genre. From this table, the proposed
approach is not optimized for the ‗Sport‘ and ‗News‘ types.
TABLE.2 CLASSIFICATION
Gener True
Positiv
e
True
Negative
False
Positiv
e
False
Negative
Accura
cy
Dahma
49
0.8776
(43)
0.9424
(180)
0.1633
(8)
0.0471
(9)
0.9292
(240)
Music
52
0.9615
(50)
0.9734
(183)
0.0385
(2)
0.0266
(5)
0.9708
(240)
Sport
30
0.5667
(17)
0.9762
(205)
0.4333
(13)
0.0238
(5)
0.9250
(240)
Cartoo
n
47
0.9333
(42)
0.9333
(182)
0.1111
(5)
0.0564
(11)
0.9333
(240)
News 0.4844
(31)
0.9602
(169)
0.5469
(35)
0.0284
(5)
0.8333
(240)
VI. CONCLUSION
In this paper, Multivariate Adaptive Regression Splines is
presented for automatic video genre classification using only
audio features. The experiments have indicated that it produce
the classification rate which were 91%. As our studies mainly
use demographic variables as independent variables, future
studies may aim at collecting more important variables to
improve the classification accuracies. From the experimental
results, MFCC3, MFCC12, Short Time Energy, Noise Frame
Rate, Silence Ratio and Spectral Flux are more useful features
than other features. The experimental evaluation of this
proposed system confirms the good performance of video
classification system except sport and new but it is still
reasonable results for both types. Further experiments on
larger volume of audio, audio-visual features will be tested in
this framework. Integrating genetic algorithms and/or grey
theory, with neural networks and/or support vector machines
are possible research directions in further improving the
classification accuracies.
ACKNOWLEDGMENT
My Sincere thanks to my supervisor Dr. Nu War, for
providing me an opportunity to do my research work. I
express my thanks to my Institution namely University of
Technology (Yatanarpon Cyber City) for providing me with a
good environment and facilities like Internet, books,
computers and all that as my source to complete this research
work. My heart-felt thanks to my family, friends and
colleagues who have helped me for the completion of this
work.
REFERENCES
[1] J. Nam, M. Alghoniemy, and A. H. Tewfik, ―Audio-visual
content-based violent scene characterization,‖ in International
Conference on Image Processing (ICIP ‘98), vol. 1, 1998, pp. 353–357.
[2] S. Moncriefi, S. Venkatesh, and C. Dorai, ―Horror film genre typing
and scene labeling via audio analysis,‖ in Multimedia and Expo, 2003,
2003.
[3] W. Zhu, C. Toklu, and S.-P. Liou, ―Automatic news video
segmentation and categorization based on closed-captioned text,‖ in
Multimedia and Expo, ICME, 2001. pp. 829–832
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1805
www.ijarcet.org
[4] D. Brezeale and D. J. Cook, ―Automatic video classification : A survey
of the literature,‖ in Systems, Man, and Cybernetics, 2008.
[5] M. Roach, L.-Q. Xu, and J. Mason, ―Classification of non-edited
broadcast video using holistic low-level features,‖ in (IWDC‘2002),
2002.
[6] R. Jasinschi and J. Louie, ―Automatic tv program genre classification
based on audio patterns,‖ in Euromicro Conference, 2001, 2001.
[7] L.-Q. Xu and Y. Li, ―Video classification using spatial-temporal
features and pca,‖ in Multimedia and Expo, (ICME ‘03), 2003.
[8] M. Roach and J. Mason, ―Classification of video genre using audio,‖
in European Conference on Speech Communication and Technology,
2001.
[9] Friedman, J.H., ―Multivariate adaptive regression splines‖. Ann. Stat.
19, 1–141 (with discussion) 1991.
First Author Hnin Ei Latt has completed Master of
Engineering (Information Technology) (M.E-IT) from
West Yangon Technology University (WYTU).
Currently, she is a PhD candidate from University of
Technology (Yatanarpon Cyber City). She is working
as a assistant- lecturer in Technology University
(Myeik) and her research areas are videos from the
huge amount of video collection..
Author‘s formal
photo

Mais conteúdo relacionado

Mais procurados

Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
A novel speech enhancement technique
A novel speech enhancement techniqueA novel speech enhancement technique
A novel speech enhancement techniqueeSAT Publishing House
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET Journal
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Datasipij
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionIDES Editor
 
TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...
TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...
TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...Onyebuchi nosiri
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
 
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Adaptive wavelet thresholding with robust hybrid features  for text-independe...Adaptive wavelet thresholding with robust hybrid features  for text-independe...
Adaptive wavelet thresholding with robust hybrid features for text-independe...IJECEIAES
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognitionsunnysyed
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker VerificationCody Ray
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
 

Mais procurados (20)

Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
A novel speech enhancement technique
A novel speech enhancement techniqueA novel speech enhancement technique
A novel speech enhancement technique
 
D04812125
D04812125D04812125
D04812125
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
 
TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...
TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...
TECHNIQUES IN PERFORMANCE IMPROVEMENT OF MOBILE WIRELESS COMMUNICATION SYSTEM...
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
 
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Adaptive wavelet thresholding with robust hybrid features  for text-independe...Adaptive wavelet thresholding with robust hybrid features  for text-independe...
Adaptive wavelet thresholding with robust hybrid features for text-independe...
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCC
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
40120140504002
4012014050400240120140504002
40120140504002
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...
 

Destaque

On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...SSA KPI
 
Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014
Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014 Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014
Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014 Persontyle
 
General Additive Models in R
General Additive Models in RGeneral Additive Models in R
General Additive Models in RNoam Ross
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)Salford Systems
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival AnalysisDerek Kane
 

Destaque (10)

On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014
Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014 Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014
Fundamentals of Machine Learning Bootcamp - 24 Nov London 2014
 
Data analysis02 twovariables
Data analysis02 twovariablesData analysis02 twovariables
Data analysis02 twovariables
 
Regression
RegressionRegression
Regression
 
General Additive Models in R
General Additive Models in RGeneral Additive Models in R
General Additive Models in R
 
Predictions from MARS
Predictions from MARSPredictions from MARS
Predictions from MARS
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)
 
Introduction to mars_2009
Introduction to mars_2009Introduction to mars_2009
Introduction to mars_2009
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
 

Semelhante a 1801 1805

Audio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV FileAudio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV Fileijtsrd
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Editor IJARCET
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET Journal
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...IRJET Journal
 
Cochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersCochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersIAEME Publication
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.IRJET Journal
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identificationijcisjournal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationIRJET Journal
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...eSAT Journals
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 
Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...journalBEEI
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...eSAT Journals
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfBala Murugan
 
SPEECH RECOGNITION USING SONOGRAM AND AANN
SPEECH RECOGNITION USING SONOGRAM AND AANNSPEECH RECOGNITION USING SONOGRAM AND AANN
SPEECH RECOGNITION USING SONOGRAM AND AANNAM Publications
 

Semelhante a 1801 1805 (20)

Audio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV FileAudio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV File
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
Cochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersCochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filters
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.
 
D111823
D111823D111823
D111823
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identification
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition application
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
H0814247
H0814247H0814247
H0814247
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
 
SPEECH RECOGNITION USING SONOGRAM AND AANN
SPEECH RECOGNITION USING SONOGRAM AND AANNSPEECH RECOGNITION USING SONOGRAM AND AANN
SPEECH RECOGNITION USING SONOGRAM AND AANN
 

Mais de Editor IJARCET

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationEditor IJARCET
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Editor IJARCET
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Editor IJARCET
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Editor IJARCET
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Editor IJARCET
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Editor IJARCET
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Editor IJARCET
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Editor IJARCET
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Editor IJARCET
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Editor IJARCET
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Editor IJARCET
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Editor IJARCET
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Editor IJARCET
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Editor IJARCET
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Editor IJARCET
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Editor IJARCET
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Editor IJARCET
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Editor IJARCET
 

Mais de Editor IJARCET (20)

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturization
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107
 

Último

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

1801 1805

  • 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013 1801 www.ijarcet.org  Abstract— A large number of researchers are attracted by video genre classification, video contents retrieval and semantics research in video processing and analysis domain. Many researchers try to propose structure or frameworks to classify the video genre that’s integrating many algorithms using low and high level features. Features generally include both useful and useless information that are difficult to separate. In this paper, video genre classification is proposed by using only the audio channel. A decomposition model is based on multivariate adaptive regression splines to separate useful and useless components and perform the genre identification is performed on these low-level acoustic features such as MFCC and timbral textual features. Experiments are conducted on a corpus composed from cartoons, sports, news, dahmas and musics on which obtain overall classification rate of 91.83%. Index Terms— Multivariate Adaptive Regression Splines , Mel Frequency Cepstral Coefficients, Factor Analysis I. INTRODUCTION Today, efficient tools are need for users to crawl the large collection because the available video amount has enlarged significantly on the Internet. While most of the research on video classification has the intent of classifying an entire video, some authors have focused on classifying segments of video such as identifying violent [1] or scary [2] scenes in a movie or distinguishing between different news segments within an entire news broadcast [3]. From this reason, many works are forced on structuring audiovisual databases by content analysis, based on text-based categorization [3]. For the purpose of video classification, features are drawn from three modalities: text, audio, and visual. Most of the proposed approaches rely on image analysis. In [4], many works are motivated by the critical need of efficient tools for structuring audiovisual databases these last years. In [2], they investigate higher level analysis like tracking of audiovisual events. Audio-based approaches were explored by automatic transcription of speech contents, or by low level audio stream analysis. However, these systems generally have poor performances on unexpected linguistic domains and in Manuscript received May, 2013. Hnin Ei Latt is with the Faculty of Information and Communication Technology, University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar Dr. Nu War was with the Faculty of Information and Communication Technology, University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar . adverse acoustic conditions. Acoustic-space characterization is presented by using statistic classifier like gaussian mixture model (GMM), neural nets or support vector machines (SVM) on cepstral domain features [5, 6, 7]. Various kinds of acoustic features have been evaluated in the field of video genre identification. In [8, 7, 5], time-domain audio features are proposed like zero crossing rates or energy distributions. Therefore, low-level approaches present a better robustness to the highly variable and unexpected conditions that may be encountered on videos. In the cepstral domain, one of the main difficulties in genre identification is due to the diversity of the acoustic patterns that may be produced by each video genre. In this paper, this problem is aim to address in the field of identifying video genres by applying multivariate adaptive regression splines. Video genre classification framework is focused on by using an audio-only method. In the next section an overview of the presented system is provided first. The architecture of the system and the basic underlying concepts are explained. Secondly, the multivariate adaptive regression splines algorithm is described. Finally, the experimental results are also shown. II. SYSTEM ARCHITECTURE VideoFiles ExtractAudio Files Feature Extraction Mel_Freq_Cepstral _Coefficient News MARSModel Sports DahmasMusics ZeroCrossingRate ShortTimeEnergy SpectralFlux SpectralCentroid Spectralrolloff Noise_Frame_Ratio Silence_Ratio Cartoons MARSModelMARSModelMARSModelMARSModel Fig.1. Overview of system architecture The overall procedure to extract audio file from an video clip has shown in figure.1. Firstly, base audio features are Audio-based Classification of Video Genre Using Multivariate Adaptive Regression Splines Hnin Ei Latt, Nu War
  • 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013 www.ijarcet.org 1802 extracted from the audio signal. The MFCC, zero crossing rate, short time energy, spectral flux, spectral centroid, spectral rolloff, noise frame ratio and silence ratio are used as the base audio features in this paper. And then, multivariate adaptive regression splines develop the model for each genre types by using the base audio features set. The next step is an efficient mechanism for classifying genre in the database and measuring their performance. The details of the proposed audio fingerprint are explained in figure. III. FEATURE EXTRACTION Many of the audio-based features are chosen to approximate the human perception of sound. In this frame work uses low-level acoustic features that are both time-domain features and frequency-domain features. The timbral textual features are calculated from the given audio signal. Timbral textual features are those used to differentiate mixture of sounds based on their instrumental compositions when the melody and the pitch components are similar. The use of timbral textural features originates from speech recognition. Extracting timbral features require preprocessing of the sound signals. The signals are divided into statistically stationary frames, usually by applying a window function at fixed intervals. The application of a window function removes the so-called ―edge effects.‖ Popular window functions including the Hamming window function. Short-Term Fourier Transform Features: This is a set of features related to timbral textures and is also captured using MFCC. It consists of Spectral Centroid, Spectral Rolloff, Spectral Flux and Low Energy, Zero Crossings and then computes the mean for all five and the variance for all but zero crossings. So, there are a total of nine features. In the time-domain, Zero crossing rate (ZCR) is the number of signal amplitude sign changes in the current frame. Higher frequencies result in higher zero crossing rates. Speech normally has a higher variability of the ZCR than in music. If the loudness and ZCR are both below thresholds, then this frame may represent silence. The silence ratio is the proportion of a frame with amplitude values below some threshold. Speech normally has a higher silence ratio than music. News has a higher silence ratio than commercials. In the frequency-domain, the energy distribution (short time energy) is the signal distribution across frequency components. The frequency centroid, which approximates brightness, is the midpoint of the spectral energy distribution and provides a measure of where the frequency components are concentrated. Normally brightness is higher in music than in speech, whose frequency is normally below 7 kHz. Bandwidth is a measure of the frequency range of a signal. Some types of sounds have more narrow frequency ranges than others. Speech typically has a lower bandwidth than music. The fundamental frequency is the lowest frequency in a sample and approximates pitch, which is a subjective measure. Mel-frequency cepstral coefficients (MFCC) are produced by taking the logarithm of the spectral components and then placing them into bins based upon the Mel frequency scale, which is perception-based. IV. MULTIVARIATE ADAPTIVE REGRESSION SPLINES Analyses were performed using multivariate adaptive regression splines, a technique that uses piece-wise linear segments to describe non-linear relationships between audio features and video genre. The theory of multivariate adaptive regression splines (MARS) was developed by Jerome Friedman [9] in 1991. Let z be the dependent response, which can be continuous or binary, and let D)Y...,,(Y=Y n n1  (1) be the set of potential predictive covariates. Then the system assume that the data are generated from an unknown ―true‖ model. In case of a continuous response this would be enYYYfz  ),...,,( 21 (2) The distribution of the error e is member of the exponential family [1]. f is approximated by applying functions, which include interactions of at most second order. That means that use the model eYYgYggYf jj jjjj j jj    21 212,1 1 110 ),()()( (3) whereas with error variance . Linear splines and their tensor products are used to model the function g(.). A one-dimensional spline can be written as    K k kk tybybbyg 1 01 )()( (4) and the knot tk in the range of the observed values of Y. For this reason the function g is situated in a linear space with the K + 2 basis functions. Thus the following model results: 00 g ,   M i j j i j ijj i ii YBYg 1 )()( 11  (5)   M i jj jj i jj ijjjj YYBYYg 1 , ),(),( 21 2121 2121  (6) because the interaction gj1,j2 is modeled by means of tensor product splines as )()(),( 22112112 ygygyyg  (7) The M represent the number of basis functions in the model and the Bs represent spline basis functions as described above and the βs are coefficients. In this approach the coefficients are estimated by using the Least Squares method. Now the coefficient matrix can be written as .)(ˆ *1** ZYYY TT   (8) Y∗ is represented as the design matrix of the selected basis functions, and Z represents the response vector. Instead of Yj, MARS uses a collection of new predictors in the form of piecewise linear basis functions are as },..,{,,...,1},)(,){( 1 Njjjj yytnjYttY   After that, the generalized cross-validation criterion is used to measure the degree of fit or lack of accuracy of the model :
  • 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013 1803 www.ijarcet.org 2 1 2 ] . 1[ )](ˆ[ 1 )( N Md yfz N MGCV N i iMi     (9) whereas denotes the fitted values of the current MARS model and d denotes the penalizing parameter. The numerator is the common residual sum of squares, which is penalized by the denominator, which accounts for the increasing variance in the case of increasing model complexity. A smaller d generates a larger model with more basis functions, a larger d creates a smaller model with less basis functions. According to the table.1, Forward Process Stage is that the stepwise addition process basis functions are added until the maximal allowed model size is reached. The largest model generally overfits the data. Then Backward pruning Stage -the stepwise deletion process- is that all ‗unnecessary‘ basis functions are removed again until a final model is obtained which is best considering the GCV that is the one with the minimum GCV. In the first step of the addition process a constant model is fitted. Subsequently the number of candidate basis functions depends on the number of possible knots per predictor variable. To keep the procedure fast, the results robust the number of possible knots per predictor and also the possible candidates per step are limited. To determine the number of potential knots of a specific covariate an order statistic is computed and a subset of it is then chosen as potential knots. Commonly these are about 20 knots per predictor, at most every third value is chosen yet. In the first iteration – after the fit of the constant model – a linear basis function on one of the predictor variables is fitted. The second iteration is accounted for both linear basis functions on another covariate and basis functions with knots of the covariate already in the model. The model to choose in every step during the forward process is the one out of all possible models which minimizes the GCV. In the backward process one basis function is deleted per step and the GCV is computed for the reduced model. The model which yields the smallest increase of GCV becomes the new one. TABLE.1. ALGORITHM Forward Process Stage:  Keeping coefficients the same for variable existed in the current model,  Update basis functions with the updated side knots  Add the new basis functions to the model and add the reflected partner  Select a new basis function pair that produces the largest decrease in training error.  Repeat the whole process until some termination condition is met: if error is too small or  if the number of model's coefficients in the next iteration is expected to be less than number of input variables. Backward Pruning Stage:  Find the subset which gives the lowest Cross Validation error, or GCV.  Delete one basis function per step and reduce model  Yield the smallest increase of GCV become the new one V. EXPERIMENTAL RESULTS The training data set includes the information about a test related with news ,cartoon, sport, music, dama video including 860 observations. Among of them, 140, 235, 176, 113, and 196 clips are ‗News‘, ‘Dahma‘, ‘Cartoon‘, ‘Sport‘ and ‗Music‘ respectively. The data of the dependent variable are binary, and it investigated ―whether the sport or not‖. Here, a 1 is interpreted as ―tested positive‖. This data sample reveals 20 explanatory variables which are given as:Y1 toY13: MFCC, Y14: Zero Crossing Rate, Y15: Short Time Energy, Y16: Spectral Flux, Y17: Spectral Centroid, Y18: Spectral rolloff, Y19: Noise Frame Ratio, and Y20: Silence Ratio. Using this data set, the proposed framework built the model for each five genre types as show in Figure 2. In model building stage, the parameters is set that maximum number of basis functions is 21 and maximum number of interactions is 2, interpolation type is piecewise-linear, penalty per knot is 3, the best value can also be found using 5-fold Cross-Validation. Larger values will lead to fewer knots being placed . (a)Model for ―Dahmas‖ (b)Model for ―Music‖ (c)Model for ―Sport‖ (d)Model for ―Cartoon‖ (e)Model for ―News‖ Fig. 2. MARS Models for each genre The variable selection results using each MARS model can be summarized in Table 2. It is observed that MFCC3 and MFCC12 do play important roles in deciding the MARS Dahma models. For the MARS Music models, MFCC3 and Noise Frame Rate do play major roles. For the MARS Sports models, Noise Frame Rate, Silence Ratio and Spectral Flux
  • 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013 www.ijarcet.org 1804 are more important. Short time energy and MFCC3 are important variable to decide the Cartoon Model. Noise Frame Ratio, Short Time Energy, MFCC3 variables are more important in deciding the New models. From these results, MFCC3, MFCC12, Short Time Energy, Noise Frame Rate, Silance Ratio and Spectral Flux are more useful features than other features. TABLE 1. DECOMPOSITION ANALYSIS OF EACH MODEL Gener Fun ctio n STD GCV basis para ms variabl es Dahma 1 2 3 4 5 0.261 0.137 0.060 0.050 0.113 0.144 0.080 0.056 0.043 0.058 1 2 2 1 2 2.5 5.0 5.0 2.5 5.0 3 12 20 2,20 3,12 Music 1 2 3 4 5 6 3.953 0.212 0.038 0.085 0.050 3.840 20.21 0.074 0.016 0.023 0.017 22.086 3 1 1 1 2 3 7.5 2.5 2.5 2.5 5.0 7.5 3 19 3,5 3,15 3,17 3,19 Sports 1 2 3 4 5 6 7 8 9 0.087 0.325 0.058 0.043 0.100 0.061 0.040 0.146 0.186 0.033 0.235 0.027 0.025 0.038 0.028 0.025 0.051 0.082 1 1 2 2 1 1 1 1 3 2.5 2.5 5.0 5.0 2.5 2.5 2.5 2.5 7.5 2 20 2,20 3,19 3,20 14,20 15,20 16,20 19,20 Cartoo n 1 2 3 4 0.347 0.081 0.351 0.104 0.315 0.060 0.542 0.061 2 1 2 1 5.0 2.5 5.0 2.5 3 14 15 17 News 5 6 7 8 9 10 0.358 0.077 0.378 0.091 0.100 0.047 0.467 0.059 0.414 0.057 0.060 0.050 2 2 2 1 1 1 5.0 5.0 5.0 2.5 2.5 2.5 19 2,15 3,15 3,19 5,17 8,14 For this proposed system, a test set of 240 clips are created to test the accuracy rate. Classification accuracy rate is 97 percent for ‗Music‘ genre which is best performance than other genre. It can be seen with matlab program for our own training database that includes 860 videos. As the point of accuracy role, these algorithm has been proofed with true positive rate, true negative rate, false positive rate, and false negative with the own 240 videos database in matlab. According to the Table 2, true positive rate are 48 percent and 56 percent for ‗News‘ and ‗Sport‘ respectively. False positive rate are also reported 54 percent and, 43 percent for ‗News‘ and ‗Sport‘ respectively. True negative rate are shown over 93 percent for each genre. From this table, the proposed approach is not optimized for the ‗Sport‘ and ‗News‘ types. TABLE.2 CLASSIFICATION Gener True Positiv e True Negative False Positiv e False Negative Accura cy Dahma 49 0.8776 (43) 0.9424 (180) 0.1633 (8) 0.0471 (9) 0.9292 (240) Music 52 0.9615 (50) 0.9734 (183) 0.0385 (2) 0.0266 (5) 0.9708 (240) Sport 30 0.5667 (17) 0.9762 (205) 0.4333 (13) 0.0238 (5) 0.9250 (240) Cartoo n 47 0.9333 (42) 0.9333 (182) 0.1111 (5) 0.0564 (11) 0.9333 (240) News 0.4844 (31) 0.9602 (169) 0.5469 (35) 0.0284 (5) 0.8333 (240) VI. CONCLUSION In this paper, Multivariate Adaptive Regression Splines is presented for automatic video genre classification using only audio features. The experiments have indicated that it produce the classification rate which were 91%. As our studies mainly use demographic variables as independent variables, future studies may aim at collecting more important variables to improve the classification accuracies. From the experimental results, MFCC3, MFCC12, Short Time Energy, Noise Frame Rate, Silence Ratio and Spectral Flux are more useful features than other features. The experimental evaluation of this proposed system confirms the good performance of video classification system except sport and new but it is still reasonable results for both types. Further experiments on larger volume of audio, audio-visual features will be tested in this framework. Integrating genetic algorithms and/or grey theory, with neural networks and/or support vector machines are possible research directions in further improving the classification accuracies. ACKNOWLEDGMENT My Sincere thanks to my supervisor Dr. Nu War, for providing me an opportunity to do my research work. I express my thanks to my Institution namely University of Technology (Yatanarpon Cyber City) for providing me with a good environment and facilities like Internet, books, computers and all that as my source to complete this research work. My heart-felt thanks to my family, friends and colleagues who have helped me for the completion of this work. REFERENCES [1] J. Nam, M. Alghoniemy, and A. H. Tewfik, ―Audio-visual content-based violent scene characterization,‖ in International Conference on Image Processing (ICIP ‘98), vol. 1, 1998, pp. 353–357. [2] S. Moncriefi, S. Venkatesh, and C. Dorai, ―Horror film genre typing and scene labeling via audio analysis,‖ in Multimedia and Expo, 2003, 2003. [3] W. Zhu, C. Toklu, and S.-P. Liou, ―Automatic news video segmentation and categorization based on closed-captioned text,‖ in Multimedia and Expo, ICME, 2001. pp. 829–832
  • 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013 1805 www.ijarcet.org [4] D. Brezeale and D. J. Cook, ―Automatic video classification : A survey of the literature,‖ in Systems, Man, and Cybernetics, 2008. [5] M. Roach, L.-Q. Xu, and J. Mason, ―Classification of non-edited broadcast video using holistic low-level features,‖ in (IWDC‘2002), 2002. [6] R. Jasinschi and J. Louie, ―Automatic tv program genre classification based on audio patterns,‖ in Euromicro Conference, 2001, 2001. [7] L.-Q. Xu and Y. Li, ―Video classification using spatial-temporal features and pca,‖ in Multimedia and Expo, (ICME ‘03), 2003. [8] M. Roach and J. Mason, ―Classification of video genre using audio,‖ in European Conference on Speech Communication and Technology, 2001. [9] Friedman, J.H., ―Multivariate adaptive regression splines‖. Ann. Stat. 19, 1–141 (with discussion) 1991. First Author Hnin Ei Latt has completed Master of Engineering (Information Technology) (M.E-IT) from West Yangon Technology University (WYTU). Currently, she is a PhD candidate from University of Technology (Yatanarpon Cyber City). She is working as a assistant- lecturer in Technology University (Myeik) and her research areas are videos from the huge amount of video collection.. Author‘s formal photo