We introduce two new metrics for the evaluation of search effectiveness for informally structured speech data: mean average segment precision (MASP) which measures retrieval performance in terms of both content segmentation and ranking with respect to relevance; and mean average segment distance-weighted precision (MASDWP) which takes into account the distance between the start of the relevant segment and the retrieved segment. We demonstrate the effectiveness of these new metrics on a retrieval test collection based on the AMI meeting corpus.
Evaluating Speech Retrieval with New Segment-Based Metrics
1. New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
New Metrics for Meaningful Evaluation of
Informally Structured Speech Retrieval
Maria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2
1Centre for Digital Video Processing
2 Centre for Next Generation Localisation
School of Computing
Dublin City University, Dublin, Ireland
3 Qatar Computing Research Institute - Qatar Foundation
Doha, Qatar
April, 3, 2012
2. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
3. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
4. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
5. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
6. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
Meetings:
7. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
Meetings:
8. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech
Collection
9. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Queries
(text)
10. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic
Speech
Recognition
System
Queries
(text)
11. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
12. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
Segmentation
Segments
13. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
Segmentation
Indexed
Segments
Indexing Segments
14. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
Information
Segmentation
Request
Indexed
Segments
Indexing Segments
15. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries Retrieval Results:
Transcript
(text) textual segments
Information
Segmentation
Request
Retrieval
Indexed
Segments
Indexing Segments
16. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries Retrieval Results:
Collection (audio) speech segments
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries Retrieval Results:
Transcript
(text) textual segments
Information
Segmentation
Request
Retrieval
Indexed
Segments
Indexing Segments
17. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Retrieval Results:
Collection speech segments
Automatic
Speech
Recognition
System
Queries Retrieval Results:
Transcript
(text) textual segments
Information
Segmentation
Request
Retrieval
Indexed
Segments
Indexing Segments
18. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
19. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
20. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
21. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
Passages:
INEX : Mean Average interpolated Precision (MAiP)
22. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
Passages:
INEX : Mean Average interpolated Precision (MAiP)
Jump-in points:
CLEF CL-SR: Mean Generalized Average Precision
(mGAP)
23. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
Passages:
INEX : Mean Average interpolated Precision (MAiP)
Jump-in points:
CLEF CL-SR: Mean Generalized Average Precision
(mGAP)
24. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average interpolated Precision (MAiP)
Task: passage text retrieval.
Document relevance is not counted in a binary way.
Precision at rank r : fraction of retrieved number of characters
that are relevant:
25. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average interpolated Precision (MAiP)
Task: passage text retrieval.
Document relevance is not counted in a binary way.
Precision at rank r : fraction of retrieved number of characters
that are relevant:
Average interpolated Precision (AiP): average of interpolated
precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
1.00):
1
AiP = . iP[x]
101
x=0.00,0.01,...,1.00
26. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average interpolated Precision (MAiP)
Task: passage text retrieval.
Document relevance is not counted in a binary way.
Precision at rank r : fraction of retrieved number of characters
that are relevant:
Average interpolated Precision (AiP): average of interpolated
precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
1.00):
1
AiP = . iP[x]
101
x=0.00,0.01,...,1.00
Shortcomings: averaging over characters in transcript is
not suitable for speech tasks
27. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
28. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
29. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
N
1 Distance
GAP = . P[r ] · 1 − · 0.1
n Granularity
r =1
30. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
N
1 Distance
GAP = . P[r ] · 1 − · 0.1
n Granularity
r =1
31. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
N
1 Distance
GAP = . P[r ] · 1 − · 0.1
n Granularity
r =1
Shortcomings: Does not take into account
how much time the user needs to spend listening
to access the relevant content
32. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
33. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Time Precision Oriented Metrics
Motivation:
Create a metric that measures both the ranking quality and
the segmentation quality with respect to relevance in a
single score.
Reflect how far the user has to listen into the segment at a
certain rank until the relevant part actually begins.
34. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
35. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
36. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
37. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
38. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
N
1
ASP = . SP[r ] · rel(sr )
n
r =1
rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
39. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
N
1
ASP = . SP[r ] · rel(sr )
n
r =1
rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
Difference from other metrics:
the amount of relevant content is measured over time
instead of text
average segment precision (ASP) is calculated at the
ranks of segments containing relevant content
rather than fixed recall points as in MAiP
40. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Penalize ASP results as mGAP
N
1 Distance
ASDWP = . SP[r ] · rel(sr ) · 1 − · 0.1
n Granularity
r =1
41. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Retrieved
Segments
1
2
3
4
5
6
42. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
Total Len
2/3
0/5
3/4
6/6
0/2
5/10
43. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP
Total Len
2/3 1
0/5 1/2
3/4 2/3
6/6 3/4
0/2 3/5
5/10 4/6
44. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP
Total Len
2/3 1
0/5 1/2
3/4 2/3
6/6 3/4
0/2 3/5
5/10 4/6
45. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP
Total Len MAP
0.771
2/3 1
0/5 1/2
3/4 2/3
6/6 3/4
0/2 3/5
5/10 4/6
46. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP
Total Len MAP
0.771
2/3 1 2/3
0/5 1/2 2/8
3/4 2/3 5/12
6/6 3/4 11/18
0/2 3/5 11/20
5/10 4/6 16/30
47. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP
Total Len MAP
0.771
2/3 1 2/3
0/5 1/2 2/8
3/4 2/3 5/12
6/6 3/4 11/18
0/2 3/5 11/20
5/10 4/6 16/30
48. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP
Total Len MAP
0.771
2/3 1 2/3
0/5 1/2 2/8
MASP
0.557
3/4 2/3 5/12
6/6 3/4 11/18
0/2 3/5 11/20
5/10 4/6 16/30
49. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP ASDWP
Total Len MAP
0.771
2/3 1 2/3 2/3 * 1.0
0/5 1/2 2/8 2/8 * 0.0
MASP
0.557
3/4 2/3 5/12 5/12 * 0.9
6/6 3/4 11/18 11/18 * 0.0
0/2 3/5 11/20 11/20 * 0.0
5/10 4/6 16/30 16/30 * 0.0
50. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP ASDWP
Total Len MAP
0.771
2/3 1 2/3 2/3 * 1.0
0/5 1/2 2/8 2/8 * 0.0
MASP
0.557
3/4 2/3 5/12 5/12 * 0.9
6/6 3/4 11/18 11/18 * 0.0
0/2 3/5 11/20 11/20 * 0.0
5/10 4/6 16/30 16/30 * 0.0
51. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP ASDWP
Total Len MAP
0.771
2/3 1 2/3 2/3 * 1.0
0/5 1/2 2/8 2/8 * 0.0
MASP
0.557
3/4 2/3 5/12 5/12 * 0.9
6/6 3/4 11/18 11/18 * 0.0
MASDWP
0.260
0/2 3/5 11/20 11/20 * 0.0
5/10 4/6 16/30 16/30 * 0.0
52. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
53. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Test Collection
Speech collection: AMI Corpus
Ca. 100 hours of data (80 hours of speech)
160 meetings:
average length – 30 minutes
Transcript
Manual
Automatic Speech Recognition (ASR), WER ≈ 30 %
Retrieval test set:
25 queries with text taken form PowerPoint slides provided
with the AMI Corpus (avr len > 10 content words)
Manual relevance assessment
54. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Segmentation Methods and Retrieval Runs
Segmentation*:
Lexical cohesion based algorithms: TextTiling, C99
Time- and length-based algorithms:
time length = 60, 120, 150, 180 seconds;
number of words per segment = 300, 400
Extreme case: No segmentation
Retrieval system:
SMART extended to use language modeling
* Manual boundaries for both types of transcript
55. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
56. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
57. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score
58. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score − > contradict user experience
59. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score − > contradict user experience
time 60: the highest MASDWP rank
60. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score − > contradict user experience
time 60: the highest MASDWP rank − > shorter average
length of the segments makes it easier to capture
the segment closer to the jump-in point
61. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Capturing Difference Between Segmentations
Rank c99 time 180 time 60
3 179/179 60/60
4 243/243 179/179 59/59
5 180/180 60/60
6 105/125 59/59
7 157/204 179/179 59/59
8 107/107 59/179 60/60
9 350/429 162/180 60/60
10 122/122 143/181
62. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Capturing Difference Between Segmentations
Rank c99 time 180 time 60
3 179/179 60/60
4 243/243 179/179 59/59
5 180/180 60/60
6 105/125 59/59
7 157/204 179/179 59/59
8 107/107 59/179 60/60
9 350/429 162/180 60/60
10 122/122 143/181
AP: one doc > time 180 > c99 > time 60
AiP: c99 > time 180 > time 60 > one doc
ASP time 180 > c99 > time 60 > one doc
63. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Capturing Difference Between Segmentations
Rank c99 time 180 time 60
3 179/179 (–) 60/60 (–)
4 243/243 (–) 179/179 (–) 59/59 (1)
5 180/180 (-69) 60/60 (–)
6 105/125 (20) 59/59 (-10)
7 157/204 (47) 179/179 (0) 59/59 (–)
8 107/107 (-45) 59/179 60/60 (–)
9 350/429 (47) 162/180 (-4) 60/60 (21)
10 122/122 (-11) 143/181 (–)
AP: one doc > time 180 > c99 > time 60
AiP: c99 > time 180 > time 60 > one doc
ASP time 180 > c99 > time 60 > one doc
ASDWP c99 > time 180 > time 60 > one doc
64. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
65. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
66. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
AiP: man<asr man; ASP: man>asr man
67. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
AiP: man<asr man; ASP: man>asr man
AiP: man<asr man; ASP: man>asr man
(relevant content moves down from higher ranks)
68. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
69. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Conclusions
MAP and MAiP do not reflect the user experience of informally
structured speech documents:
70. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Conclusions
MAP and MAiP do not reflect the user experience of informally
structured speech documents:
MAP is appropriate for clearly defined documents
MAiP works with transcript characters
71. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Conclusions
MAP and MAiP do not reflect the user experience of informally
structured speech documents:
MAP is appropriate for clearly defined documents
MAiP works with transcript characters
Introduced MASP and MASDWP:
MASP: captures the amount of relevant content that
appears at different ranks
MASDWP: rewards runs where segmentation algorithms
put boundaries closer to the relevant content and these
segments are higher in the ranked list
72. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Thank you for your attention!