SlideShare uma empresa Scribd logo
1 de 72
Baixar para ler offline
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval




New Metrics for Meaningful Evaluation of
 Informally Structured Speech Retrieval

Maria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2

                    1Centre for Digital Video Processing
                  2 Centre for Next Generation Localisation
                             School of Computing
                    Dublin City University, Dublin, Ireland
       3   Qatar Computing Research Institute - Qatar Foundation
                             Doha, Qatar


                                April, 3, 2012
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:




                   Meetings:
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:




                   Meetings:
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech
                   Collection
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)




                                                               Queries
                                                                (text)
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

                                                Automatic
                                                Speech
                                                Recognition
                                                System

                                                               Queries
                                                                (text)
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                               Queries
                   Transcript
                                                                (text)
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                               Queries
                   Transcript
                                                                (text)
          Segmentation

                   Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                               Queries
                   Transcript
                                                                (text)
          Segmentation
                                                   Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                     Queries
                   Transcript
                                                       (text)
                                             Information
          Segmentation
                                               Request
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                     Queries                            Retrieval Results:
                   Transcript
                                                       (text)                           textual segments
                                             Information
          Segmentation
                                               Request
                                                                                                 Retrieval
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries                  Retrieval Results:
                   Collection                                  (audio)                  speech segments

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                     Queries                            Retrieval Results:
                   Transcript
                                                       (text)                           textual segments
                                             Information
          Segmentation
                                               Request
                                                                                                 Retrieval
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                                               Retrieval Results:
                   Collection                                                           speech segments

             Automatic
             Speech
             Recognition
             System
                                                     Queries                            Retrieval Results:
                   Transcript
                                                       (text)                           textual segments
                                             Information
          Segmentation
                                               Request
                                                                                                 Retrieval
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
                 Passages:
                         INEX : Mean Average interpolated Precision (MAiP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
                 Passages:
                         INEX : Mean Average interpolated Precision (MAiP)

                 Jump-in points:
                         CLEF CL-SR: Mean Generalized Average Precision
                         (mGAP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
                 Passages:
                         INEX : Mean Average interpolated Precision (MAiP)

                 Jump-in points:
                         CLEF CL-SR: Mean Generalized Average Precision
                         (mGAP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average interpolated Precision (MAiP)
         Task: passage text retrieval.
         Document relevance is not counted in a binary way.
         Precision at rank r : fraction of retrieved number of characters
         that are relevant:
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average interpolated Precision (MAiP)
         Task: passage text retrieval.
         Document relevance is not counted in a binary way.
         Precision at rank r : fraction of retrieved number of characters
         that are relevant:




         Average interpolated Precision (AiP): average of interpolated
         precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
         1.00):
                                 1
                        AiP =       .                 iP[x]
                                101
                                                          x=0.00,0.01,...,1.00
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average interpolated Precision (MAiP)
         Task: passage text retrieval.
         Document relevance is not counted in a binary way.
         Precision at rank r : fraction of retrieved number of characters
         that are relevant:




         Average interpolated Precision (AiP): average of interpolated
         precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
         1.00):
                                 1
                        AiP =       .                 iP[x]
                                101
                                                          x=0.00,0.01,...,1.00

         Shortcomings: averaging over characters in transcript is
         not suitable for speech tasks
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content




                                                N
                                         1                              Distance
                           GAP =           .          P[r ] · 1 −                  · 0.1
                                         n                             Granularity
                                               r =1
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content




                                                N
                                         1                              Distance
                           GAP =           .          P[r ] · 1 −                  · 0.1
                                         n                             Granularity
                                               r =1
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content




                                                N
                                         1                              Distance
                           GAP =           .          P[r ] · 1 −                  · 0.1
                                         n                             Granularity
                                               r =1

         Shortcomings: Does not take into account
         how much time the user needs to spend listening
         to access the relevant content
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Time Precision Oriented Metrics



         Motivation:

              Create a metric that measures both the ranking quality and
              the segmentation quality with respect to relevance in a
              single score.

              Reflect how far the user has to listen into the segment at a
              certain rank until the relevant part actually begins.
New Metrics   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :




         Average Segment Precision:
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :




         Average Segment Precision:
                                                     N
                                       1
                                  ASP = .                  SP[r ] · rel(sr )
                                       n
                                                    r =1

         rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :




         Average Segment Precision:
                                                     N
                                       1
                                  ASP = .                  SP[r ] · rel(sr )
                                       n
                                                    r =1

         rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
         Difference from other metrics:
              the amount of relevant content is measured over time
              instead of text
              average segment precision (ASP) is calculated at the
              ranks of segments containing relevant content
              rather than fixed recall points as in MAiP
New Metrics      New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Distance-Weighted Precision
  (MASDWP)

         Penalize ASP results as mGAP




                                 N
                          1                                                Distance
              ASDWP =       .          SP[r ] · rel(sr ) · 1 −                        · 0.1
                          n                                               Granularity
                                r =1
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Retrieved
              Segments
                 1

                 2

                 3

                 4

                 5

                 6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
              Total Len
                2/3

                 0/5

                  3/4

                 6/6

                 0/2

                5/10
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP
              Total Len
                2/3                  1

                 0/5                1/2

                  3/4               2/3

                 6/6                3/4

                 0/2                3/5

                5/10                4/6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP
              Total Len
                2/3                  1

                 0/5                1/2

                  3/4               2/3

                 6/6                3/4

                 0/2                3/5

                5/10                4/6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1

                 0/5                1/2

                  3/4               2/3

                 6/6                3/4

                 0/2                3/5

                5/10                4/6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3

                 0/5                1/2                  2/8

                  3/4               2/3                 5/12

                 6/6                3/4                11/18

                 0/2                3/5                11/20

                5/10                4/6                16/30
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3

                 0/5                1/2                  2/8

                  3/4               2/3                 5/12

                 6/6                3/4                11/18

                 0/2                3/5                11/20

                5/10                4/6                16/30
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3

                 0/5                1/2                  2/8
                                                                                                    MASP
                                                                                                    0.557
                  3/4               2/3                 5/12

                 6/6                3/4                11/18

                 0/2                3/5                11/20

                5/10                4/6                16/30
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP                    ASDWP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3                  2/3 * 1.0

                 0/5                1/2                  2/8                  2/8 * 0.0
                                                                                                    MASP
                                                                                                    0.557
                  3/4               2/3                 5/12                 5/12 * 0.9

                 6/6                3/4                11/18                11/18 * 0.0

                 0/2                3/5                11/20                11/20 * 0.0

                5/10                4/6                16/30                16/30 * 0.0
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP                    ASDWP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3                  2/3 * 1.0

                 0/5                1/2                  2/8                  2/8 * 0.0
                                                                                                    MASP
                                                                                                    0.557
                  3/4               2/3                 5/12                 5/12 * 0.9

                 6/6                3/4                11/18                11/18 * 0.0

                 0/2                3/5                11/20                11/20 * 0.0

                5/10                4/6                16/30                16/30 * 0.0
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP                    ASDWP
              Total Len                                                                              MAP
                                                                                                     0.771
                2/3                  1                   2/3                  2/3 * 1.0

                 0/5                1/2                  2/8                  2/8 * 0.0
                                                                                                     MASP
                                                                                                     0.557
                  3/4               2/3                 5/12                 5/12 * 0.9

                 6/6                3/4                11/18                11/18 * 0.0
                                                                                                    MASDWP
                                                                                                     0.260
                 0/2                3/5                11/20                11/20 * 0.0

                5/10                4/6                16/30                16/30 * 0.0
Retrieval Collection   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
          Speech Retrieval

          Speech Search Evaluation
            Mean Average Precision (MAP)
            Mean Average interpolated Precision (MAiP)
            mean Generalized Average Precision (mGAP)

          New Metrics
            Mean Average Segment Precision (MASP)
            Mean Average Segment Distance-Weighted Precision
            (MASDWP)

          Retrieval Collection

          Experimental Results

          Conclusions
Retrieval Collection    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Test Collection


          Speech collection: AMI Corpus
                   Ca. 100 hours of data (80 hours of speech)
                   160 meetings:
                       average length – 30 minutes
                   Transcript
                       Manual
                       Automatic Speech Recognition (ASR), WER ≈ 30 %
          Retrieval test set:
                   25 queries with text taken form PowerPoint slides provided
                   with the AMI Corpus (avr len > 10 content words)
                   Manual relevance assessment
Retrieval Collection   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Segmentation Methods and Retrieval Runs



                   Segmentation*:
                       Lexical cohesion based algorithms: TextTiling, C99
                       Time- and length-based algorithms:
                       time length = 60, 120, 150, 180 seconds;
                       number of words per segment = 300, 400
                       Extreme case: No segmentation
                   Retrieval system:
                       SMART extended to use language modeling

          * Manual boundaries for both types of transcript
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score − > contradict user experience
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score − > contradict user experience
                 time 60: the highest MASDWP rank
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score − > contradict user experience
                 time 60: the highest MASDWP rank − > shorter average
                 length of the segments makes it easier to capture
                 the segment closer to the jump-in point
Experimental Results    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Capturing Difference Between Segmentations

                       Rank      c99                       time 180                  time 60
                       3                                   179/179                   60/60
                       4         243/243                   179/179                   59/59
                       5                                   180/180                   60/60
                       6         105/125                                             59/59
                       7         157/204                   179/179                   59/59
                       8         107/107                   59/179                    60/60
                       9         350/429                   162/180                   60/60
                       10        122/122                   143/181
Experimental Results    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Capturing Difference Between Segmentations

                       Rank c99           time 180     time 60
                       3                  179/179      60/60
                       4     243/243      179/179      59/59
                       5                  180/180      60/60
                       6     105/125                   59/59
                       7     157/204      179/179      59/59
                       8     107/107      59/179       60/60
                       9     350/429      162/180      60/60
                       10    122/122      143/181
                         AP:   one doc > time 180 > c99 > time 60
                         AiP: c99 > time 180 > time 60 > one doc
                         ASP time 180 > c99 > time 60 > one doc
Experimental Results    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Capturing Difference Between Segmentations

                       Rank c99            time 180      time 60
                       3                   179/179 (–) 60/60 (–)
                       4     243/243 (–) 179/179 (–) 59/59 (1)
                       5                   180/180 (-69) 60/60 (–)
                       6     105/125 (20)                59/59 (-10)
                       7     157/204 (47) 179/179 (0) 59/59 (–)
                       8     107/107 (-45) 59/179        60/60 (–)
                       9     350/429 (47) 162/180 (-4) 60/60 (21)
                       10    122/122 (-11) 143/181 (–)
                         AP:   one doc > time 180 > c99 > time 60
                         AiP: c99 > time 180 > time 60 > one doc
                         ASP time 180 > c99 > time 60 > one doc
                       ASDWP c99 > time 180 > time 60 > one doc
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques




                                            AiP: man<asr man; ASP: man>asr man
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques




                                            AiP: man<asr man; ASP: man>asr man



                              AiP: man<asr man; ASP: man>asr man
         (relevant content moves down from higher ranks)
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Conclusions

         MAP and MAiP do not reflect the user experience of informally
         structured speech documents:
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Conclusions

         MAP and MAiP do not reflect the user experience of informally
         structured speech documents:
              MAP is appropriate for clearly defined documents
              MAiP works with transcript characters
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Conclusions

         MAP and MAiP do not reflect the user experience of informally
         structured speech documents:
              MAP is appropriate for clearly defined documents
              MAiP works with transcript characters
         Introduced MASP and MASDWP:

              MASP: captures the amount of relevant content that
              appears at different ranks

              MASDWP: rewards runs where segmentation algorithms
              put boundaries closer to the relevant content and these
              segments are higher in the ranked list
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval




         Thank you for your attention!

Mais conteúdo relacionado

Destaque

Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Maria Eskevich
 
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskDCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskMaria Eskevich
 
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...Maria Eskevich
 
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Maria Eskevich
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Maria Eskevich
 
Anderson Oehler 360 Presentation
Anderson Oehler 360 PresentationAnderson Oehler 360 Presentation
Anderson Oehler 360 Presentationucwisdom
 
Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback Dubai, UAE
 
Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Maria Eskevich
 
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Hitesh Gossain
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
 
Photostory slideshow
Photostory slideshow Photostory slideshow
Photostory slideshow milo197
 
Levima - Company profile
Levima - Company profileLevima - Company profile
Levima - Company profileLevima
 
Edinburgh University H&S presentation
Edinburgh University H&S presentationEdinburgh University H&S presentation
Edinburgh University H&S presentationLevima
 
Focus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalFocus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalMaria Eskevich
 
Projects list for ece & eee
Projects list for ece & eeeProjects list for ece & eee
Projects list for ece & eeevarun29071
 
Software Security Testing
Software Security TestingSoftware Security Testing
Software Security Testingsrivinayak
 
болезнь паркинсона
болезнь паркинсонаболезнь паркинсона
болезнь паркинсонаgunka_gl
 
Tugas 6 laporan laba rugi
Tugas 6 laporan laba rugiTugas 6 laporan laba rugi
Tugas 6 laporan laba rugiTika Evitasuhri
 

Destaque (20)

Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
 
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskDCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
 
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
 
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
Team leaders
Team leadersTeam leaders
Team leaders
 
Anderson Oehler 360 Presentation
Anderson Oehler 360 PresentationAnderson Oehler 360 Presentation
Anderson Oehler 360 Presentation
 
Leadership Tips: Giving feedback from McCrudden Training
Leadership Tips: Giving feedback from McCrudden TrainingLeadership Tips: Giving feedback from McCrudden Training
Leadership Tips: Giving feedback from McCrudden Training
 
Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback
 
Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?
 
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
Photostory slideshow
Photostory slideshow Photostory slideshow
Photostory slideshow
 
Levima - Company profile
Levima - Company profileLevima - Company profile
Levima - Company profile
 
Edinburgh University H&S presentation
Edinburgh University H&S presentationEdinburgh University H&S presentation
Edinburgh University H&S presentation
 
Focus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalFocus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrieval
 
Projects list for ece & eee
Projects list for ece & eeeProjects list for ece & eee
Projects list for ece & eee
 
Software Security Testing
Software Security TestingSoftware Security Testing
Software Security Testing
 
болезнь паркинсона
болезнь паркинсонаболезнь паркинсона
болезнь паркинсона
 
Tugas 6 laporan laba rugi
Tugas 6 laporan laba rugiTugas 6 laporan laba rugi
Tugas 6 laporan laba rugi
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Evaluating Speech Retrieval with New Segment-Based Metrics

  • 1. New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Maria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2 1Centre for Digital Video Processing 2 Centre for Next Generation Localisation School of Computing Dublin City University, Dublin, Ireland 3 Qatar Computing Research Institute - Qatar Foundation Doha, Qatar April, 3, 2012
  • 2. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 3. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity
  • 4. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  • 5. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  • 6. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  • 7. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  • 8. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Collection
  • 9. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Queries (text)
  • 10. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Speech Recognition System Queries (text)
  • 11. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text)
  • 12. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Segments
  • 13. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Indexed Segments Indexing Segments
  • 14. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Information Segmentation Request Indexed Segments Indexing Segments
  • 15. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 16. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Retrieval Results: Collection (audio) speech segments Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 17. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Retrieval Results: Collection speech segments Automatic Speech Recognition System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 18. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 19. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units:
  • 20. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP)
  • 21. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP)
  • 22. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  • 23. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  • 24. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant:
  • 25. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00
  • 26. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00 Shortcomings: averaging over characters in transcript is not suitable for speech tasks
  • 27. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  • 28. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  • 29. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  • 30. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  • 31. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1 Shortcomings: Does not take into account how much time the user needs to spend listening to access the relevant content
  • 32. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 33. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Time Precision Oriented Metrics Motivation: Create a metric that measures both the ranking quality and the segmentation quality with respect to relevance in a single score. Reflect how far the user has to listen into the segment at a certain rank until the relevant part actually begins.
  • 34. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP)
  • 35. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  • 36. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  • 37. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision:
  • 38. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  • 39. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0 Difference from other metrics: the amount of relevant content is measured over time instead of text average segment precision (ASP) is calculated at the ranks of segments containing relevant content rather than fixed recall points as in MAiP
  • 40. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Distance-Weighted Precision (MASDWP) Penalize ASP results as mGAP N 1 Distance ASDWP = . SP[r ] · rel(sr ) · 1 − · 0.1 n Granularity r =1
  • 41. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Retrieved Segments 1 2 3 4 5 6
  • 42. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ Total Len 2/3 0/5 3/4 6/6 0/2 5/10
  • 43. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 44. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 45. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len MAP 0.771 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 46. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 47. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 48. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 MASP 0.557 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 49. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 50. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 51. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 MASDWP 0.260 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 52. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 53. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Test Collection Speech collection: AMI Corpus Ca. 100 hours of data (80 hours of speech) 160 meetings: average length – 30 minutes Transcript Manual Automatic Speech Recognition (ASR), WER ≈ 30 % Retrieval test set: 25 queries with text taken form PowerPoint slides provided with the AMI Corpus (avr len > 10 content words) Manual relevance assessment
  • 54. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Segmentation Methods and Retrieval Runs Segmentation*: Lexical cohesion based algorithms: TextTiling, C99 Time- and length-based algorithms: time length = 60, 120, 150, 180 seconds; number of words per segment = 300, 400 Extreme case: No segmentation Retrieval system: SMART extended to use language modeling * Manual boundaries for both types of transcript
  • 55. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 56. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009
  • 57. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score
  • 58. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience
  • 59. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank
  • 60. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank − > shorter average length of the segments makes it easier to capture the segment closer to the jump-in point
  • 61. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181
  • 62. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181 AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc
  • 63. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 (–) 60/60 (–) 4 243/243 (–) 179/179 (–) 59/59 (1) 5 180/180 (-69) 60/60 (–) 6 105/125 (20) 59/59 (-10) 7 157/204 (47) 179/179 (0) 59/59 (–) 8 107/107 (-45) 59/179 60/60 (–) 9 350/429 (47) 162/180 (-4) 60/60 (21) 10 122/122 (-11) 143/181 (–) AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc ASDWP c99 > time 180 > time 60 > one doc
  • 64. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  • 65. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  • 66. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man
  • 67. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man AiP: man<asr man; ASP: man>asr man (relevant content moves down from higher ranks)
  • 68. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 69. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents:
  • 70. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters
  • 71. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters Introduced MASP and MASDWP: MASP: captures the amount of relevant content that appears at different ranks MASDWP: rewards runs where segmentation algorithms put boundaries closer to the relevant content and these segments are higher in the ranked list
  • 72. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Thank you for your attention!