SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Telefonica	
  Research	
  at	
  Mediaeval	
  
 2012	
  Spoken	
  Web	
  Search	
  Task	
  
              Xavier	
  Anguera	
  
Outline	
  
•  System	
  descripBon	
  
   –  Speech	
  AcBvity	
  detecBon	
  


•  Proposed	
  systems	
  
   –  Segmental-­‐DTW	
  
   –  IR-­‐DTW	
  
•  Results	
  
Proposed	
  overall	
  system	
  




                  S-­‐DTW	
     IR-­‐DTW	
  
Frontend	
  
MFCC-­‐39	
  features	
  
   (12	
  Cepstra	
  +	
  Energy)	
  +	
  Delta	
  +	
  DeltaDelta	
  
Mean	
  &	
  variance	
  normalizaBon	
  at	
  sentence	
  level	
  
	
  
Posterior	
  probabiliBes	
  from	
  a	
  GMM	
  background	
  
     	
  model	
  
L2-­‐normalizaBon	
  
   	
  
Background	
  model	
  training	
  
                                             IteraBve	
  128	
  
                                           Gaussian	
  Spling	
  


                                               EM-­‐ML	
  GMM	
  
                                                 training	
  


                                                 K-­‐means	
  	
  
                                                assignment	
  


[1]	
  “Speaker	
  Independent	
  discriminant	
  feature	
  extracBon	
  for	
  acousBc	
  paXern	
  matching”,	
  
Xavier	
  Anguera,	
  ICASSP	
  2012	
  
Silence	
  modeling	
  
10%	
  lowest	
  energy	
  
        frames	
  
                                   •  1	
  Gauss	
  for	
  noise	
  and	
  4	
  
                                      Gauss	
  for	
  speech	
  
  Silence/Speech	
                 •  Perform	
  10	
  iteraBons	
  or	
  
   GMM	
  training	
                  while	
  %	
  variaBon	
  is	
  high	
  

 Decode	
  the	
  data	
  
2234444343322444444444443222222234444444444444444444444443210000011222443	
  




      Threshold	
  set	
  to	
  values	
  <2	
  (i.e.	
  silence	
  +	
  lowest	
  speech)	
  
Overlap	
  postprocessing	
  
   •  We	
  compute	
  the	
  percentage	
  of	
  overlap	
  
      between	
  all	
  matching	
  paths	
  
                             min(End1, End2) ! max(Start1, Start2)
                     Ovl =
                               min(End1! Start1, End2 ! Start2)

   •  For	
  pairs	
  with	
  >	
  0.5	
  overlap	
  
       –  Select	
  the	
  match	
  with	
  highest	
  score	
  
Start1	
                                                   End1	
  
                          Match1	
  

                                             Match2	
  
                   Start2	
                                              End2	
  




                            min(ends)	
  –	
  max(starts)	
  
Ovl	
  =	
  	
                                                                      =	
  0.8	
  
                                       Min(size1,	
  size2)	
  
S-­‐DTW	
  submission	
  
•  Based	
  on	
  last	
  year’s	
  submission	
  but	
  with	
  the	
  
   system	
  improvements	
  above	
  
DTW	
  local	
  constraints	
  
•  no	
  global	
  constraints	
  are	
  applied	
  in	
  order	
  to	
  allow	
  for	
  
   matching	
  of	
  any	
  segment	
  among	
  both	
  sequences	
  
•  Local	
  constraints	
  are	
  set	
  to	
  allow	
  warping	
  up	
  to	
  2X	
  
              " D(m ! 2, n) + d(xm , yn )                                                                   (m,	
  n)	
  
              $
              $ jumps(m ! 2, n) + 3
              $ D(m, n ! 2) + d(xm , yn )                   (m-­‐2,	
  n-­‐1)	
  
D(m, n) = min #
              $ jumps(m, n ! 2) + 3
              $ D(m ! 2, n ! 2) + d(x , y )
                                      m   n
              $                                                                     (m-­‐1,	
  n-­‐2)	
  
              % jumps(m ! 2, n ! 2) + 4                     (m-­‐1,	
  n-­‐1)	
  



•  Posteriorgram	
  features	
  distance:	
                       $ N!1             '
                                              d(xm , yn ) = ! log & # xm [i]" yn [i])
                                                                  % i=0             (
S-­‐DTW	
  algorithm	
  
Query	
  term	
  




                                               Reference	
  term	
  
S-­‐DTW	
  algorithm	
  
Query	
  term	
  




                                               Reference	
  term	
  
IR-­‐DTW	
  
•  Total	
  rework	
  from	
  last	
  year’s	
  system	
  
•  Aim	
  at	
  keeping	
  the	
  same	
  accuracy,	
  but:	
  
    –  Much	
  less	
  memory	
  usage	
  
    –  Faster	
  retrieval	
  
•  IR	
  (InformaBon	
  Retrieval)	
  cause	
  we	
  use	
  
   reference	
  features	
  indexing	
  for	
  fast	
  nearest	
  
   neighbors	
  retrieval	
  
Official	
  results	
  

 MTWV	
        Dev-­‐dev	
     Dev-­‐eval	
     Eval-­‐dev	
     Eval-­‐eval	
  

IR-­‐DTW	
      0.3903	
        0.3139	
         0.4983	
         0.3416	
  

 S-­‐DTW	
      0.3745	
        0.3001	
         0.4716	
         0.3113	
  




 ATWV	
        Dev-­‐dev	
     Dev-­‐eval	
     Eval-­‐dev	
     Eval-­‐eval	
  

IR-­‐DTW	
      0.3866	
        0.3042	
         0.4219	
          0.3301	
  

S-­‐DTW	
       0.3644	
         0.292	
         0.3988	
          0.2942	
  
DEV-DEV results
                          98
                                                                       Random Performance
                                                              IR-DTW MTWV=0.390 Scr=0.387
                          95
                                                               S-DTW MTWV=0.375 Scr=0.695

                          90


                          80
Miss probability (in %)




                          60



                          40



                          20


                          10

                          5
                          .0001   .001 .004 .01.02 .05 .1 .2    .5 1      2        5   10   20   40
                                                    False Alarm probability (in %)
EVAL-EVAL Results
                          98
                                                                          Random Performance
                                                                         IR-DTW MTWV=0.342
                          95
                                                                          S-DTW MTWV=0.311

                          90


                          80
Miss probability (in %)




                          60



                          40



                          20


                          10

                          5
                          .0001   .001 .004 .01.02 .05 .1 .2    .5 1      2        5   10      20   40
                                                    False Alarm probability (in %)
DEV-EVAL results
                          98
                                                                         Random Performance
                                                                        IR-DTW MTWV=0.314
                          95
                                                                         S-DTW MTWV=0.300

                          90


                          80
Miss probability (in %)




                          60



                          40



                          20


                          10

                          5
                          .0001   .001 .004 .01.02 .05 .1 .2   .5   1    2          5   10    20   40
                                                   False Alarm probability (in %)
EVAL-DEV results
                          98
                                                                          Random Performance
                                                                         IR-DTW MTWV=0.498
                          95
                                                                          S-DTW MTWV=0.472

                          90


                          80
Miss probability (in %)




                          60



                          40



                          20


                          10

                          5
                          .0001   .001 .004 .01.02 .05 .1 .2    .5 1      2        5   10      20   40
                                                    False Alarm probability (in %)
Xavier	
  Anguera	
  
Summary	
                                        xanguera@Bd.es	
  


     •  We	
  propose	
  2	
  systems,	
  all	
  sharing	
  the	
  same	
  
        framework	
  
     •  Some	
  improvements	
  in	
  the	
  framework	
  were	
  
        incorporated:	
  speech/silence	
  classificaBon,	
  new	
  
        overlap	
  detecBon,	
  modified	
  background	
  model.	
  
     •  IR-­‐DTW	
  is	
  a	
  total	
  reimplementaBon	
  of	
  SDTW,	
  
        using	
  informaBon	
  retrieval	
  concepts	
  

Mais conteúdo relacionado

Mais procurados

Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...
Vinod Sharma
 
Signals and systems( chapter 1)
Signals and systems( chapter 1)Signals and systems( chapter 1)
Signals and systems( chapter 1)
Fariza Zahari
 

Mais procurados (19)

2. signal & systems beyonds
2. signal & systems  beyonds2. signal & systems  beyonds
2. signal & systems beyonds
 
Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...
 
Signals & systems
Signals & systems Signals & systems
Signals & systems
 
Lecture123
Lecture123Lecture123
Lecture123
 
Ch1
Ch1Ch1
Ch1
 
Instrumentation Engineering : Signals & systems, THE GATE ACADEMY
Instrumentation Engineering : Signals & systems, THE GATE ACADEMYInstrumentation Engineering : Signals & systems, THE GATE ACADEMY
Instrumentation Engineering : Signals & systems, THE GATE ACADEMY
 
1.introduction to signals
1.introduction to signals1.introduction to signals
1.introduction to signals
 
Lecture2 Signal and Systems
Lecture2 Signal and SystemsLecture2 Signal and Systems
Lecture2 Signal and Systems
 
signal and system
signal and system signal and system
signal and system
 
Signals and systems( chapter 1)
Signals and systems( chapter 1)Signals and systems( chapter 1)
Signals and systems( chapter 1)
 
Ec8352 signals and systems 2 marks with answers
Ec8352 signals and systems   2 marks with answersEc8352 signals and systems   2 marks with answers
Ec8352 signals and systems 2 marks with answers
 
Operations on Continuous Time Signals
Operations on Continuous Time SignalsOperations on Continuous Time Signals
Operations on Continuous Time Signals
 
Notes for signals and systems
Notes for signals and systemsNotes for signals and systems
Notes for signals and systems
 
Alternative Approach for Computing the Activation Factor of the PNLMS Algorithm
Alternative Approach for Computing the Activation Factor of the PNLMS AlgorithmAlternative Approach for Computing the Activation Factor of the PNLMS Algorithm
Alternative Approach for Computing the Activation Factor of the PNLMS Algorithm
 
Sns slide 1 2011
Sns slide 1 2011Sns slide 1 2011
Sns slide 1 2011
 
Signal and System, CT Signal DT Signal, Signal Processing(amplitude and time ...
Signal and System, CT Signal DT Signal, Signal Processing(amplitude and time ...Signal and System, CT Signal DT Signal, Signal Processing(amplitude and time ...
Signal and System, CT Signal DT Signal, Signal Processing(amplitude and time ...
 
Matlab programs
Matlab programsMatlab programs
Matlab programs
 
Lecture5 Signal and Systems
Lecture5 Signal and SystemsLecture5 Signal and Systems
Lecture5 Signal and Systems
 
Lecture9
Lecture9Lecture9
Lecture9
 

Destaque

תחרות אלוף הידע
תחרות אלוף הידעתחרות אלוף הידע
תחרות אלוף הידע
sabal1
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
MediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
MediaEval2012
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
MediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
MediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
MediaEval2012
 
When Ideas and Opportunities Collide
When Ideas and Opportunities CollideWhen Ideas and Opportunities Collide
When Ideas and Opportunities Collide
Grow America
 
Week 2 discussion 2
Week 2 discussion 2Week 2 discussion 2
Week 2 discussion 2
LILBIT2012
 
Event Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED TaskEvent Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED Task
MediaEval2012
 
Working Notes for the Placing Task at MediaEval 2012
Working Notes for the Placing Task at MediaEval 2012Working Notes for the Placing Task at MediaEval 2012
Working Notes for the Placing Task at MediaEval 2012
MediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
MediaEval2012
 
CERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskCERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection Task
MediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval2012
 

Destaque (20)

תחרות אלוף הידע
תחרות אלוף הידעתחרות אלוף הידע
תחרות אלוף הידע
 
Simha_RP
Simha_RPSimha_RP
Simha_RP
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
When Ideas and Opportunities Collide
When Ideas and Opportunities CollideWhen Ideas and Opportunities Collide
When Ideas and Opportunities Collide
 
Idea or opportunity?
Idea or opportunity?Idea or opportunity?
Idea or opportunity?
 
Thotcon2013
Thotcon2013Thotcon2013
Thotcon2013
 
Closing
ClosingClosing
Closing
 
Week 2 discussion 2
Week 2 discussion 2Week 2 discussion 2
Week 2 discussion 2
 
Event Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED TaskEvent Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED Task
 
Working Notes for the Placing Task at MediaEval 2012
Working Notes for the Placing Task at MediaEval 2012Working Notes for the Placing Task at MediaEval 2012
Working Notes for the Placing Task at MediaEval 2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
Mentor Strategy Session: Business Plan and Video
Mentor Strategy Session: Business Plan and VideoMentor Strategy Session: Business Plan and Video
Mentor Strategy Session: Business Plan and Video
 
CERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskCERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection Task
 
Live pitch event
Live pitch eventLive pitch event
Live pitch event
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012
 

Semelhante a Telefonica Research System for the Spoken Web Search task at Mediaeval 2012

Molecular models, threads and you
Molecular models, threads and youMolecular models, threads and you
Molecular models, threads and you
Jiahao Chen
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
MediaEval2012
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Jia-Bin Huang
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
Anis Nasir
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Alejandro Bellogin
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
Cdiscount
 

Semelhante a Telefonica Research System for the Spoken Web Search task at Mediaeval 2012 (20)

Molecular models, threads and you
Molecular models, threads and youMolecular models, threads and you
Molecular models, threads and you
 
Presentació renovables
Presentació renovablesPresentació renovables
Presentació renovables
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting Recognition
 
image compression ppt
image compression pptimage compression ppt
image compression ppt
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology
 
Performance tests - it's a trap
Performance tests - it's a trapPerformance tests - it's a trap
Performance tests - it's a trap
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
 
Evaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated DatabasesEvaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated Databases
 
SPICE MODEL of SLP-WB89A-51 , White ,TA=0degree (Standard Model) in SPICE PARK
SPICE MODEL of SLP-WB89A-51 , White ,TA=0degree (Standard Model) in SPICE PARKSPICE MODEL of SLP-WB89A-51 , White ,TA=0degree (Standard Model) in SPICE PARK
SPICE MODEL of SLP-WB89A-51 , White ,TA=0degree (Standard Model) in SPICE PARK
 
Deep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la modaDeep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la moda
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
SPICE MODEL of RHRG75120 , TC=25degree (Standard Model) in SPICE PARK
SPICE MODEL of RHRG75120 , TC=25degree (Standard Model) in SPICE PARKSPICE MODEL of RHRG75120 , TC=25degree (Standard Model) in SPICE PARK
SPICE MODEL of RHRG75120 , TC=25degree (Standard Model) in SPICE PARK
 
Quality value in TaqMan® Genotyper Software - Technical Note from Life Techno...
Quality value in TaqMan® Genotyper Software - Technical Note from Life Techno...Quality value in TaqMan® Genotyper Software - Technical Note from Life Techno...
Quality value in TaqMan® Genotyper Software - Technical Note from Life Techno...
 
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 

Mais de MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
MediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
MediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
MediaEval2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
MediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
MediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
MediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
MediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
MediaEval2012
 
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
MediaEval2012
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
MediaEval2012
 
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
MediaEval2012
 
ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video Classification
MediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
MediaEval2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
MediaEval2012
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
MediaEval2012
 

Mais de MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
 
ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video Classification
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 

Telefonica Research System for the Spoken Web Search task at Mediaeval 2012

  • 1. Telefonica  Research  at  Mediaeval   2012  Spoken  Web  Search  Task   Xavier  Anguera  
  • 2. Outline   •  System  descripBon   –  Speech  AcBvity  detecBon   •  Proposed  systems   –  Segmental-­‐DTW   –  IR-­‐DTW   •  Results  
  • 3. Proposed  overall  system   S-­‐DTW   IR-­‐DTW  
  • 4. Frontend   MFCC-­‐39  features   (12  Cepstra  +  Energy)  +  Delta  +  DeltaDelta   Mean  &  variance  normalizaBon  at  sentence  level     Posterior  probabiliBes  from  a  GMM  background    model   L2-­‐normalizaBon    
  • 5. Background  model  training   IteraBve  128   Gaussian  Spling   EM-­‐ML  GMM   training   K-­‐means     assignment   [1]  “Speaker  Independent  discriminant  feature  extracBon  for  acousBc  paXern  matching”,   Xavier  Anguera,  ICASSP  2012  
  • 6. Silence  modeling   10%  lowest  energy   frames   •  1  Gauss  for  noise  and  4   Gauss  for  speech   Silence/Speech   •  Perform  10  iteraBons  or   GMM  training   while  %  variaBon  is  high   Decode  the  data  
  • 7. 2234444343322444444444443222222234444444444444444444444443210000011222443   Threshold  set  to  values  <2  (i.e.  silence  +  lowest  speech)  
  • 8. Overlap  postprocessing   •  We  compute  the  percentage  of  overlap   between  all  matching  paths   min(End1, End2) ! max(Start1, Start2) Ovl = min(End1! Start1, End2 ! Start2) •  For  pairs  with  >  0.5  overlap   –  Select  the  match  with  highest  score  
  • 9. Start1   End1   Match1   Match2   Start2   End2   min(ends)  –  max(starts)   Ovl  =     =  0.8   Min(size1,  size2)  
  • 10. S-­‐DTW  submission   •  Based  on  last  year’s  submission  but  with  the   system  improvements  above  
  • 11. DTW  local  constraints   •  no  global  constraints  are  applied  in  order  to  allow  for   matching  of  any  segment  among  both  sequences   •  Local  constraints  are  set  to  allow  warping  up  to  2X   " D(m ! 2, n) + d(xm , yn ) (m,  n)   $ $ jumps(m ! 2, n) + 3 $ D(m, n ! 2) + d(xm , yn ) (m-­‐2,  n-­‐1)   D(m, n) = min # $ jumps(m, n ! 2) + 3 $ D(m ! 2, n ! 2) + d(x , y ) m n $ (m-­‐1,  n-­‐2)   % jumps(m ! 2, n ! 2) + 4 (m-­‐1,  n-­‐1)   •  Posteriorgram  features  distance:   $ N!1 ' d(xm , yn ) = ! log & # xm [i]" yn [i]) % i=0 (
  • 12. S-­‐DTW  algorithm   Query  term   Reference  term  
  • 13. S-­‐DTW  algorithm   Query  term   Reference  term  
  • 14. IR-­‐DTW   •  Total  rework  from  last  year’s  system   •  Aim  at  keeping  the  same  accuracy,  but:   –  Much  less  memory  usage   –  Faster  retrieval   •  IR  (InformaBon  Retrieval)  cause  we  use   reference  features  indexing  for  fast  nearest   neighbors  retrieval  
  • 15. Official  results   MTWV   Dev-­‐dev   Dev-­‐eval   Eval-­‐dev   Eval-­‐eval   IR-­‐DTW   0.3903   0.3139   0.4983   0.3416   S-­‐DTW   0.3745   0.3001   0.4716   0.3113   ATWV   Dev-­‐dev   Dev-­‐eval   Eval-­‐dev   Eval-­‐eval   IR-­‐DTW   0.3866   0.3042   0.4219   0.3301   S-­‐DTW   0.3644   0.292   0.3988   0.2942  
  • 16. DEV-DEV results 98 Random Performance IR-DTW MTWV=0.390 Scr=0.387 95 S-DTW MTWV=0.375 Scr=0.695 90 80 Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  • 17. EVAL-EVAL Results 98 Random Performance IR-DTW MTWV=0.342 95 S-DTW MTWV=0.311 90 80 Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  • 18. DEV-EVAL results 98 Random Performance IR-DTW MTWV=0.314 95 S-DTW MTWV=0.300 90 80 Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  • 19. EVAL-DEV results 98 Random Performance IR-DTW MTWV=0.498 95 S-DTW MTWV=0.472 90 80 Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  • 20. Xavier  Anguera   Summary   xanguera@Bd.es   •  We  propose  2  systems,  all  sharing  the  same   framework   •  Some  improvements  in  the  framework  were   incorporated:  speech/silence  classificaBon,  new   overlap  detecBon,  modified  background  model.   •  IR-­‐DTW  is  a  total  reimplementaBon  of  SDTW,   using  informaBon  retrieval  concepts