TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM

1. Content System architecture Experimental Results Conclusion TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM MediaEval Benchmarking Initiative for Multimedia Evaluation Jozef Vavrek, Mat´ˇ Pleva, Jozef Juh´r us a Department of Electronics and Multimedia Communications Technical University of Koˇice, Slovak Republic s e-mail:{jozef.vavrek; matus.pleva; jozef.juhar}@tuke.sk 04 October, 2012

2. Content System architecture Experimental Results Conclusion 1 System architecture Segmentation Feature Extraction Support Vector Machine Method Searching Algorithm 2 Experimental Results 3 Conclusion

3. Content System architecture Experimental Results Conclusion Proposed query-by-example searching architecture Audio documents Feature DTW Segmentation utterances extraction (MCA) Support Vector Audio documents Machine queries

4. Content System architecture Experimental Results Conclusion Segmentation and pre-processing segmentation: into the segments with variable length: lsegment = lquery ⇒ rectangular window use: for further phase of pre-processing and feature extraction pre-processing: pre-emphasis filtering, Hamming’s window: lwindow = lquery /100 ⇒ overlapping - 50%, use: to emphasize higher frequency components, to reduce abrupt changes within the spectrum of the signal, to increase classification performance of the SVM classifier utterance 1.segment 2.segment 3.segment 4.segment framing lwindow=lquery/100 query lsegment=lquery

5. Content System architecture Experimental Results Conclusion Feature Extraction coefficients (features) frames (instances) 0 12 0 12 log of amplitude IDFT 12 transformation filtering 0 (DFT, FFT) spectrum (Mel filter bank) (DCT) 0 12 Mel Feature vector matrix avgMCA 1000 utterance segment query 500 250,1 MCA MFCCs MFCCs+ZCR MFCCs+ZCR+MPEG-7 Dimension Similarity matrix 13x13 (ASS, ASC, ASF, ASE) (Cost matrix)

6. Content System architecture Experimental Results Conclusion Support Vector Machine classiﬁer linear SVM with soft and hard margin deﬁned by decision hyperplane l d(w, x, b) = w· x + b = wi xi + b, (1) i=1 x2 x2 Hard margin Class 1; y=+1 Class 1; y=+1 Soft margin Decision hyperplane Class 2; y=-1 Class 2; y=-1 x1 x1

7. Content System architecture Experimental Results Conclusion Nonlinear SVM classiﬁer mapping into the high-dimensional feature space by kernel functions l d(x) = αi yi z(x)· z(xi ) + b, (2) i=1 K (xi , xj ) = zi · zj = Φ(xi )· Φ(xj ) . (3) x2 x2 Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) x1 x1 used kernel functions Mat. expression Type K (xi , xj ) = xi · xj Linear d K (xi , xj ) = γ xi · xj + 1 Polynomial of degree d K (xi , xj ) = exp(−γ|xi − xj |2 ) Gaussian Radial Basis Function (RBF)

8. Content System architecture Experimental Results Conclusion SVM based searching (classiﬁcation) algorithm Segment 1 Segment 2 Segment 3 . . . Segment N lquery query001 frames segment 1 +1 lwindow=lquery/100 -1 0 1 ... 11 12 13 MFCCs query001 segment 2 +1 -1 query001 segment N Compute MCA of DTW +1 -1 < threshold Train SVM with linear SVM model Compute miss(+1) kernel and C=1 miss(-1) Num. of iterations Query detected > threshold

9. Content System architecture Experimental Results Conclusion Experimental results Number of iteration Score parameter: 100 = 2.82 correctly predicted frames Error rate: 1 − all tested frames = 0.18 miss(+)+miss(−) Miss-classiﬁcation rate: all predicted data = 0.12 Evaluation results of the tested algorithm database set P(FA) P(Miss) ATWV evalQ-devC 1.54617 0.960 -0.052 devQ-evalC 1.62595 0.948 -0.233 evalQ-evalC 1.68694 0.974 -0.164 devQ-devC 1.78786 0.943 -0.194

10. Content System architecture Experimental Results Conclusion Conclusions and Future Work Proposed query-by-example searching system based on the minimum cost alignment of DTW algorithm and unsupervised SVM miss-classiﬁcation error rate. No other resources were used during the development. Poor detection performance with high number of false alarms and miss-detections caused by variable length of queries and detected terms with similar spectral characteristics within each utterances. Relatively high computational time (searching time) of proposed algorithm. Future work: design an eﬀective query-by-example searching system with lower computational time and miss-detections.

11. Content System architecture Experimental Results Conclusion Thank You For Your Attention

TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM

Semelhante a TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM (20)

Mais de MediaEval2012

Mais de MediaEval2012 (20)

Último

Último (20)

TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM