TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
1. Content System architecture Experimental Results Conclusion
TUKE MediaEval 2012: Spoken Web Search using
DTW and Unsupervised SVM
MediaEval Benchmarking Initiative for Multimedia Evaluation
Jozef Vavrek, Mat´ˇ Pleva, Jozef Juh´r
us a
Department of Electronics and Multimedia Communications
Technical University of Koˇice, Slovak Republic
s
e-mail:{jozef.vavrek; matus.pleva; jozef.juhar}@tuke.sk
04 October, 2012
2. Content System architecture Experimental Results Conclusion
1 System architecture
Segmentation
Feature Extraction
Support Vector Machine Method
Searching Algorithm
2 Experimental Results
3 Conclusion
4. Content System architecture Experimental Results Conclusion
Segmentation and pre-processing
segmentation: into the segments with variable length: lsegment = lquery ⇒
rectangular window
use: for further phase of pre-processing and feature extraction
pre-processing: pre-emphasis filtering, Hamming’s window: lwindow = lquery /100
⇒ overlapping - 50%,
use: to emphasize higher frequency components, to reduce abrupt changes
within the spectrum of the signal, to increase classification performance of the
SVM classifier
utterance
1.segment 2.segment 3.segment 4.segment
framing
lwindow=lquery/100
query
lsegment=lquery
6. Content System architecture Experimental Results Conclusion
Support Vector Machine classifier
linear SVM with soft and hard margin defined by decision hyperplane
l
d(w, x, b) = w· x + b = wi xi + b, (1)
i=1
x2 x2 Hard margin
Class 1; y=+1 Class 1; y=+1
Soft margin
Decision hyperplane
Class 2; y=-1 Class 2; y=-1
x1 x1
7. Content System architecture Experimental Results Conclusion
Nonlinear SVM classifier
mapping into the high-dimensional feature space by kernel functions
l
d(x) = αi yi z(x)· z(xi ) + b, (2)
i=1
K (xi , xj ) = zi · zj = Φ(xi )· Φ(xj ) . (3)
x2 x2
Φ( ) Φ( )
Φ( )
Φ( )
Φ( ) Φ( )
Φ( ) Φ( )
Φ( ) Φ( )
Φ( )
Φ( )
x1 x1
used kernel functions
Mat. expression Type
K (xi , xj ) = xi · xj Linear
d
K (xi , xj ) = γ xi · xj + 1 Polynomial of degree d
K (xi , xj ) = exp(−γ|xi − xj |2 ) Gaussian Radial Basis Function (RBF)
8. Content System architecture Experimental Results Conclusion
SVM based searching (classification) algorithm
Segment 1 Segment 2 Segment 3 . . . Segment N
lquery
query001
frames
segment 1
+1 lwindow=lquery/100
-1 0 1 ... 11 12
13 MFCCs
query001 segment 2
+1 -1 query001 segment N
Compute MCA of DTW +1 -1
< threshold
Train SVM with linear SVM model Compute
miss(+1)
kernel and C=1 miss(-1)
Num. of iterations
Query detected > threshold
9. Content System architecture Experimental Results Conclusion
Experimental results
Number of iteration
Score parameter: 100 = 2.82
correctly predicted frames
Error rate: 1 − all tested frames = 0.18
miss(+)+miss(−)
Miss-classification rate: all predicted data = 0.12
Evaluation results of the tested algorithm
database set P(FA) P(Miss) ATWV
evalQ-devC 1.54617 0.960 -0.052
devQ-evalC 1.62595 0.948 -0.233
evalQ-evalC 1.68694 0.974 -0.164
devQ-devC 1.78786 0.943 -0.194
10. Content System architecture Experimental Results Conclusion
Conclusions and Future Work
Proposed query-by-example searching system based on the
minimum cost alignment of DTW algorithm and unsupervised
SVM miss-classification error rate.
No other resources were used during the development.
Poor detection performance with high number of false alarms
and miss-detections caused by variable length of queries and
detected terms with similar spectral characteristics within
each utterances.
Relatively high computational time (searching time) of
proposed algorithm.
Future work: design an effective query-by-example searching
system with lower computational time and miss-detections.
11. Content System architecture Experimental Results Conclusion
Thank You For Your Attention